Optimistic Transaction

Under the optimistic transaction model, modification conflicts are regarded as part of the transaction commit. TiDB cluster uses the optimistic transaction model by default before version 3.0.8, uses pessimistic transaction model after version 3.0.8. System variable tidb_txn_mode controls TiDB uses optimistic transaction mode or pessimistic transaction mode.

This document talks about the implementation of optimistic transaction in TiDB. It is recommended that you have learned about the principles of optimistic transaction.

This document refers to the code of TiDB v5.2.1

Begin Optimistic Transaction

The main function stack to start an optimistic transaction is as followers.

(a *ExecStmt) Exec
    (a *ExecStmt) handleNoDelay
        (a *ExecStmt) handleNoDelayExecutor
            Next
                (e *SimpleExec) Next
                    (e *SimpleExec) executeBegin

The function (e *SimpleExec) executeBegin do the main work for a “BEGIN” statement，The important comment and simplified code is as followers. The completed code is here .

/*

Check the syntax "start transaction read only" and "as of timestamp" used correctly.
If stale read timestamp was set,  creates a new stale read transaction and sets "in transaction" state, and return.
create a new transaction and set some properties like snapshot, startTS etc
*/

func (e *SimpleExec) executeBegin(ctx context.Context, s *ast.BeginStmt) error {
    if s.ReadOnly {
       // the statement "start transaction read only" must be used with tidb_enable_noop_functions is true
       //  the statement "start transaction read only  as of timestamp" can be used Whether  tidb_enable_noop_functions  is true or false，but that tx_read_ts mustn't be set.
       //  the statement "start transaction read only  as of timestamp" must ensure the timestamp is in the legal safe point range        
       enableNoopFuncs := e.ctx.GetSessionVars().EnableNoopFuncs
       if !enableNoopFuncs && s.AsOf == nil {
           return expression.ErrFunctionsNoopImpl.GenWithStackByArgs("READ ONLY")
       }

       if s.AsOf != nil {
           if e.ctx.GetSessionVars().TxnReadTS.PeakTxnReadTS() > 0 {
              return errors.New("start transaction read only as of is forbidden after set transaction read only as of")
           }
       }
    } 

    // process stale read transaction
    if e.staleTxnStartTS > 0 {
      // check timestamp of stale read correctly
       if err := e.ctx.NewStaleTxnWithStartTS(ctx, e.staleTxnStartTS); err != nil {
           return err
       }

       // ignore tidb_snapshot configuration if in stale read transaction
       vars := e.ctx.GetSessionVars()
       vars.SetSystemVar(variable.TiDBSnapshot, "")
    
       // set "in transaction" state and return
       vars.SetInTxn(true)

       return nil
    }

    /* If BEGIN is the first statement in TxnCtx, we can reuse the existing transaction, without the need to call NewTxn, which commits the existing transaction and begins a new one. If the last un-committed/un-rollback transaction is a time-bounded read-only transaction, we should always create a new transaction. */
    
    txnCtx := e.ctx.GetSessionVars().TxnCtx
    if txnCtx.History != nil || txnCtx.IsStaleness {
       err := e.ctx.NewTxn(ctx)
    }

  // set "in transaction" state
   e.ctx.GetSessionVars().SetInTxn(true)

   // create a new transaction and set some properties like snapshot, startTS etc.
   txn, err := e.ctx.Txn(true)
    // set Linearizability option
    if s.CausalConsistencyOnly {
       txn.SetOption(kv.GuaranteeLinearizability, false)
    }

    return nil
}

DML Executed In Optimistic Transaction

There are three kinds of DML operations, such as update, delete and insert. For simplicity, This article only describes the update statement execution process. DML mutations are not sended to tikv directly, but buffered in TiDB temporarily, commit operation make the mutations effective at last.

The main function stack to execute an update statement such as “update t1 set id2 = 2 where pk = 1” is as followers.

(a *ExecStmt) Exec
    (a *ExecStmt) handleNoDelay
        (a *ExecStmt) handleNoDelayExecutor
            (e *UpdateExec) updateRows
                 Next
                   (e *PointGetExecutor) Next

(e *UpdateExec) updateRows

The function (e *UpdateExec) updateRows does the main work for update statement. The important comment and simplified code are as followers. The completed code is here .

/*
Take a batch of data that needs to be updated each time.
Traverse every row in the batch, make handle which identifies the data uniquely for the row and generate a new row.
Write the new row to table.
*/

func (e *UpdateExec) updateRows(ctx context.Context) (int, error) {
    globalRowIdx := 0
    chk := newFirstChunk(e.children[0])
    // composeFunc generates a new row 
    composeFunc = e.composeNewRow
    totalNumRows := 0

    for {
       // call "Next" method recursively until every executor finished, every "Next" returns a batch rows
       err := Next(ctx, e.children[0], chk)

       // If all rows are updated, return
       if chk.NumRows() == 0 {
           break
       }

       for rowIdx := 0; rowIdx < chk.NumRows(); rowIdx++ {
           // take one row from the batch above
           chunkRow := chk.GetRow(rowIdx)
	        // convert the data from chunk.Row to types.DatumRow，stored by fields
           datumRow := chunkRow.GetDatumRow(fields)
           // generate handle which is  unique ID for every row
           e.prepare(datumRow)
           // compose non-generated columns
           newRow, err := composeFunc(globalRowIdx, datumRow, colsInfo)
           // merge non-generated columns
           e.merge(datumRow, newRow, false)
	        // compose generated columns and merge generated columns
           if e.virtualAssignmentsOffset < len(e.OrderedList) {          
              newRow = e.composeGeneratedColumns(globalRowIdx, newRow, colsInfo)     
              e.merge(datumRow, newRow, true)
           }

           // write to table
           e.exec(ctx, e.children[0].Schema(), datumRow, newRow)
       }
    }
}

Commit Optimistic Transaction

Committing transaction includes “prewrite” and “commit” two phases that are explained separately below. The function (c *twoPhaseCommitter) execute does the main work for committing transaction. The important comment and simplified code are as followers. The completed code is here .

/*
do the "prewrite" operation first
if OnePC transaction, return immediately
if AsyncCommit transaction, commit the transaction Asynchronously and return
if not OnePC and AsyncCommit transaction, commit the transaction synchronously.
*/

func (c *twoPhaseCommitter) execute(ctx context.Context) (err error) {
    // do the "prewrite" operation
    c.prewriteStarted = true

    start := time.Now()
    err = c.prewriteMutations(bo, c.mutations) 
    if c.isOnePC() {
       // If OnePC transaction, return immediately
       return nil
    }  

    if c.isAsyncCommit() {  
       // commit the transaction Asynchronously and return for AsyncCommit
       go func() {
           err := c.commitMutations(commitBo, c.mutations)
       }          

       return nil
    } else {  
        // do the "commit" phase 
        return c.commitTxn(ctx, commitDetail)
    } 
}

prewrite

The entry function to prewrite a transaction is (c *twoPhaseCommitter) prewriteMutations which calls the function (batchExe *batchExecutor) process to do it. The function (batchExe *batchExecutor) process calls (batchExe *batchExecutor) startWorker to prewrite evenry batch parallelly. The function (batchExe *batchExecutor) startWorker calls (action actionPrewrite) handleSingleBatch to prewrite a single batch.

(batchExe *batchExecutor) process

The important comment and simplified code are as followers. The completed code is here .

// start worker routine to prewrite every batch parallely and collect results
func (batchExe *batchExecutor) process(batches []batchMutations) error {
    var err error
    err = batchExe.initUtils()
    // For prewrite, stop sending other requests after receiving first error.
    var cancel context.CancelFunc

    if _, ok := batchExe.action.(actionPrewrite); ok {
       batchExe.backoffer, cancel = batchExe.backoffer.Fork()
       defer cancel()  
    }

    // concurrently do the work for each batch.
    ch := make(chan error, len(batches))
    exitCh := make(chan struct{})
    go batchExe.startWorker(exitCh, ch, batches)
    // check results of every batch prewrite synchronously, if one batch fails, 
    // stops every prewrite worker routine immediately.
    for i := 0; i < len(batches); i++ {
       if e := <-ch; e != nil {
           // Cancel other requests and return the first error.
           if cancel != nil {
              cancel()
           }

           if err == nil {
              err = e
           }
       }
    }

    close(exitCh)   // break the loop of function startWorker

    return err
}

(batchExe *batchExecutor) startWorker

The important comment and simplified code are as followers. The completed code is here .

// startWork concurrently do the work for each batch considering rate limit
func (batchExe *batchExecutor) startWorker(exitCh chan struct{}, ch chan error, batches []batchMutations) {
    for idx, batch1 := range batches {
       waitStart := time.Now()
       //  Limit the number of go routines by the buffer size of channel
       if exit := batchExe.rateLimiter.GetToken(exitCh); !exit {
           batchExe.tokenWaitDuration += time.Since(waitStart)
           batch := batch1
           //  call the function "handleSingleBatch" to prewrite every batch keys
           go func() {
              defer batchExe.rateLimiter.PutToken() //  release the chan buffer
              ch <- batchExe.action.handleSingleBatch(batchExe.committer, singleBatchBackoffer, batch)
           }()

       } else {
           break
       }
    }
}

(action actionPrewrite) handleSingleBatch

The important comment and simplified code are as followers. The completed code is here .

/*

create a prewrite request and a region request sender that sends the prewrite request to tikv.
(1)get Prewrite Response coming from tikv
(2)If no error happened and it is OnePC transaction, update onePCCommitTS by prewriteResp and return
(3)if no error happened and it is AsyncCommit transaction, update minCommitTS  if need and return
(4)If errors hanpped beacause of lock confilict, extract the locks from the error responsed, resolove the locks expired
(5)do the backoff for prewrite
*/

func (action actionPrewrite) handleSingleBatch(c *twoPhaseCommitter, bo *retry.Backoffer, batch batchMutations) (err error) {
    // create a prewrite request and a region request sender that sends the prewrite request to tikv.
    txnSize := uint64(c.regionTxnSize[batch.region.GetID()])
    req := c.buildPrewriteRequest(batch, txnSize)
    sender := locate.NewRegionRequestSender(c.store.GetRegionCache(), c.store.GetTiKVClient())

    for {
       resp, err := sender.SendReq(bo, req, batch.region, client.ReadTimeoutShort)
       regionErr, err := resp.GetRegionError()  

       // get Prewrite Response coming from tikv
       prewriteResp := resp.Resp.(*kvrpcpb.PrewriteResponse)
       keyErrs := prewriteResp.GetErrors()
       if len(keyErrs) == 0 {  
           // If no error happened and it is OnePC transaction, update onePCCommitTS by prewriteResp and return
           if c.isOnePC() {
                c.onePCCommitTS = prewriteResp.OnePcCommitTs
                return nil
          } 

          // if no error happened and it is AsyncCommit transaction, update minCommitTS  if need and return
          if c.isAsyncCommit() {
                  if prewriteResp.MinCommitTs > c.minCommitTS {
                       c.minCommitTS = prewriteResp.MinCommitTs
                  }
           }
           return nil
       }// if len(keyErrs) == 0

       // If errors hanpped beacause of lock confilict, extract the locks from the error responsed
       var locks []*txnlock.Lock
       for _, keyErr := range keyErrs {
           // Extract lock from key error
           lock, err1 := txnlock.ExtractLockFromKeyErr(keyErr)
           if err1 != nil {
              return errors.Trace(err1)
           }
           locks = append(locks, lock)
       }// for _, keyErr := range keyErrs
       // resolve conflict locks expired, do the backoff for prewrite
       start := time.Now()
       msBeforeExpired, err := c.store.GetLockResolver().ResolveLocksForWrite(bo, c.startTS, c.forUpdateTS, locks)
       if msBeforeExpired > 0 {
           err = bo.BackoffWithCfgAndMaxSleep(retry.BoTxnLock, int(msBeforeExpired), errors.Errorf("2PC prewrite lockedKeys: %d", len(locks)))
           if err != nil {
              return errors.Trace(err) // backoff exceeded maxtime, returns error
           }
       }
    }// for loop
}

commit

The entry function of commiting a transaction is (c *twoPhaseCommitter) commitMutations which calls the function (c *twoPhaseCommitter) doActionOnGroupMutations to do it. The batch of primary key will be committed first, then the function (batchExe *batchExecutor) process calls (batchExe *batchExecutor) startWorker to commit the rest batches parallelly and asynchronously. The function (batchExe *batchExecutor) startWorker calls (actionCommit) handleSingleBatch to commit a single batch.

(c *twoPhaseCommitter) doActionOnGroupMutations

The important comment and simplified code are as followers. The completed code is here .

/*
If the groups contain primary, commit the primary batch synchronously
If the first time to commit, spawn a goroutine to commit secondary batches asynchronously
if retry to commit,  commit the secondary batches synchronously, because itself is in the asynchronously goroutine
*/

func (c *twoPhaseCommitter) doActionOnGroupMutations(bo *retry.Backoffer, action twoPhaseCommitAction, groups []groupedMutations) error {
    batchBuilder := newBatched(c.primary())
    // Whether the groups being operated contain primary
    firstIsPrimary := batchBuilder.setPrimary()
    actionCommit, actionIsCommit := action.(actionCommit)
    c.checkOnePCFallBack(action, len(batchBuilder.allBatches()))
    // If the groups contain primary, commit the primary batch synchronously
    if firstIsPrimary &&
       (actionIsCommit && !c.isAsyncCommit()) {
         // primary should be committed(not async commit)/cleanup/pessimistically locked first
       err = c.doActionOnBatches(bo, action, batchBuilder.primaryBatch())    
       batchBuilder.forgetPrimary()
    }
  // If the first time to commit, spawn a goroutine to commit secondary batches asynchronously
   // if retry to commit,  commit the secondary batches synchronously, because itself is in the asynchronously goroutine
    if actionIsCommit && !actionCommit.retry && !c.isAsyncCommit() {
        secondaryBo := retry.NewBackofferWithVars(c.store.Ctx(), CommitSecondaryMaxBackoff, c.txn.vars)
       c.store.WaitGroup().Add(1)
       go func() {
           defer c.store.WaitGroup().Done()
           e := c.doActionOnBatches(secondaryBo, action, batchBuilder.allBatches())
       }
    }
    else {
       err = c.doActionOnBatches(bo, action, batchBuilder.allBatches())
    }

    return errors.Trace(err)
}

(batchExe *batchExecutor) process

The function (c *twoPhaseCommitter) doActionOnGroupMutations calls (c *twoPhaseCommitter) doActionOnBatches to do the second phase of commit. The function (c *twoPhaseCommitter) doActionOnBatches calls (batchExe *batchExecutor) process to do the main work.

The important comment and simplified code of function (batchExe *batchExecutor) process are as mentioned above in prewrite part . The completed code is here .

(actionCommit) handleSingleBatch

The function (batchExe *batchExecutor) process calls the function (actionCommit) handleSingleBatch to send commit request to all tikv nodes.

The important comment and simplified code are as followers. The completed code is here .

/*
create a commit request and commit sender.
If regionErr happened, backoff and retry the commit operation.
If the error is not a regionErr, but rejected by TiKV beacause the commit ts was expired, retry with a newer commits.
Other errors happened, return error immediately.
No error happened, exit the for loop and return success.
*/

func (actionCommit) handleSingleBatch(c *twoPhaseCommitter, bo *retry.Backoffer, batch batchMutations) error {
    // create a commit request and commit sender
    keys := batch.mutations.GetKeys()
    req := tikvrpc.NewRequest(tikvrpc.CmdCommit, &kvrpcpb.CommitRequest{
       StartVersion: c.startTS,
        Keys:     keys,
       CommitVersion: c.commitTS,
    }, kvrpcpb.Context{Priority: c.priority, SyncLog: c.syncLog,
       ResourceGroupTag: c.resourceGroupTag, DiskFullOpt: c.diskFullOpt}) 

    tBegin := time.Now()
    attempts := 0 
    sender := locate.NewRegionRequestSender(c.store.GetRegionCache(), c.store.GetTiKVClient())

    for {
       attempts++
       resp, err := sender.SendReq(bo, req, batch.region, client.ReadTimeoutShort)
       regionErr, err := resp.GetRegionError()
       // If regionErr happened, backoff and retry the commit operation
       if regionErr != nil {
           // For other region error and the fake region error, backoff because there's something wrong.
           // For the real EpochNotMatch error, don't backoff.
           if regionErr.GetEpochNotMatch() == nil || locate.IsFakeRegionError(regionErr) {
              err = bo.Backoff(retry.BoRegionMiss, errors.New(regionErr.String()))
              if err != nil {
                  return errors.Trace(err)
              }
           }

           same, err := batch.relocate(bo, c.store.GetRegionCache())
           if err != nil {
              return errors.Trace(err)
           }

           if same {
              continue
           }

           err = c.doActionOnMutations(bo, actionCommit{true}, batch.mutations)
           return errors.Trace(err)
       }// if regionErr != nil

	   // If the error is not a regionErr, but rejected by TiKV beacause the commit ts was expired, retry with a newer commits. Other errors happened, return error immediately.

       commitResp := resp.Resp.(*kvrpcpb.CommitResponse)
       if keyErr := commitResp.GetError(); keyErr != nil {
           if rejected := keyErr.GetCommitTsExpired(); rejected != nil {
              // 2PC commitTS rejected by TiKV, retry with a newer commits, update commit ts and retry.
              commitTS, err := c.store.GetTimestampWithRetry(bo, c.txn.GetScope())
              c.mu.Lock()
              c.commitTS = commitTS
              c.mu.Unlock()
              // Update the commitTS of the request and retry.
              req.Commit().CommitVersion = commitTS
              continue
           }

           if c.mu.committed {
              // 2PC failed commit key after primary key committed
              // No secondary key could be rolled back after it's primary key is committed.
               return errors.Trace(err)
           }
           return err
       }

        // No error happened, exit the for loop
        break  
    }// for loop
}

Keyboard shortcuts

TiDB Development Guide