Insert or Update pattern for Sql Server

over 17 years ago

Note SQL Server 2008 users, you now have a built-in MERGE statement you can use instead of these patterns.

A very common problem that is surprisingly difficult to solve properly with SQL is the UPDATE or INSERT problem (sometimes called upsert). I would like to insert a row in to the table if the key does not exist and update a row if a key exists.

Oracle, DB2 and even Sqlite have SQL syntax that allows you to solve this problem in a single statement. This is not the case for SQL server. You have to do this manually.

So, for the table:

create table t (pk int primary key, hitCount int)

The following pattern is very commonly used:

if exists (select * from t where pk = @id)
begin
   update t set hitCount = hitCount+1
   where pk = @id
end 
else
begin 
   insert into t values (@id, 1) 
end

The above code works just fine from single threaded applications, however when called from multiple threads, you start getting primary key violations.

This happens because the batch is not executed atomically. Two or more threads could “think” that there is no data for the existing key and subsequently try to insert data.

To overcome this issue you have two options. The first is, handle primary key violations. The second option is to fix up the transactional integrity of the batch.

Lets explore the second option.

The first step is to put all the SQL in a begin tran commit tran block.

begin tran
if exists(select * from t where pk = @id)
begin
   update t set hitCount = hitCount+1
   where pk = @id
end
else
begin
   insert t (pk, hitCount)
   values (@id, 1)
end
commit tran

This does not solve the problem. Primary key violations keep on happening. The reason for this is that the default isolation level for transactions is “read committed”, meaning that you are allowed to read all committed transactions. But new transactions may get committed between the first and second statement.

So, after you read up about transaction isolations levels you may decide to fix up the code and put it in a serializable transaction:

set transaction isolation level serializable
begin tran
if exists(select * from t where pk = @id)
begin
   update t set hitCount = hitCount+1
   where pk = @id
end
else
begin
   insert t (pk, hitCount)
   values (@id, 1)
end
commit tran

After doing this you stop getting primary key violations, however … you create a much bigger problem. Now you may notice you start getting deadlocks.

Why is this happening? Well, the serializable transaction means the select statement will acquire a shared locks on the table. You will get a RangeS-S lock on the key when the data is missing (and a key / page lock if the data is there). The shared locks are compatible with other shared locks, meaning other threads could acquire the same lock on the same key/page. Later on in the transaction you need to upgrade the lock to an exclusive lock, so you can change the data. However, this can not be done if other transactions are holding shared locks. Which leads to deadlocks, For example:

# Key is missing
# Transaction 1 acquires RangeS-S on key 1
# Transaction 2 acquires RangeS-S on key 1
# Transaction 1 wants to upgrade the lock to a RangeX-X (starts waiting)
# Transaction 2 wants to upgrade the lock to a RangeX-X (starts waiting)
# SQL Server detects a deadlock and kills off one of the transactions

So, we need to clean up the transaction a little more. During the execution of the SELECT statement we would like to block the execution of the same SELECT statement on other threads. A handy hint we can use is the UPDLOCK hint, it means that we will acquire update locks instead of shared locks. This does not block other readers, but will block statements which need to also hold update locks (like update statements or queries using the UPDLOCK hint).

The result of this is that we have created a critical section in our transaction.

So here is the first good way to solve the problem:

begin tran
if exists (select * from t with (updlock,serializable) where pk = @id)
   begin
   update t set hitCount = hitCount+1
   where pk = @id
end
else
begin
   insert t (pk, hitCount)
   values (@id, 1)
end
commit tran

A second method we can use is:

begin tran
   update t with (serializable)
   set hitCount = hitCount + 1
   where pk = @id
   if @@rowcount = 0
   begin
      insert t (pk, hitCount)
      values (@id,1)
   end
commit tran

Keep in mind the serializable hint is critical, without it we will be very prone to deadlocks, for reasons similar to what was described above.

Both of these methods exhibited very similar performance characteristics during my testing. Keep in mind that in a production system they may perform differently.

Thank you Erland for catching out many bugs in the first draft of this article!

Posted by: Sam Permalink | Comments (31)

Comments

Mladen over 16 years ago

this method can come in handy too

Sam Saffron over 16 years ago

Mladen, Looks interesting, but DB level mutexes seem to go against everything my inside dba voice tells me.

<p>I mean, how can I be sure 1 million percent that they will be released? Is it the correct level of granularity? meaning I would like to be inserting and updating stuff in two different areas in the same table at the same time. How would I ever port code that uses mutexes to another db?</p>

Mladen over 16 years ago

oh yesâ€¦ those are all valid questions :)) like alwas the answer is: it depends. app locks in sql server are pretty well built and work excellent. i haven't seen them fail yet. especially if you have set them up for the transaction context.

<p>i don't count app locks as a granularity level since they never place locks on the data itself. just on the portion of the execution code.</p>


<p>about portingâ€¦ well i don't care about that :)))

i don’t deal with other db’s

Kent over 16 years ago

Excellent article! So, just to make sure I understand things, the point of the SERIALIZABLE hints in your two methods is to make sure the update lock is held for the entire transaction, right?

Sam Saffron over 16 years ago

Kent,

<p>Thank you! The point of the serializable hint is to ensure the transactional consistency of the block.</p>

<p>take the following example:</p>

<pre>begin tran

select hitCount from t where id = 1
â€”returns 1
â€”in another query analyzer run â€œupdate t set hitCount = 5â€
select hitCount from t where id = 1
â€”returns 5
commit tran

<p>so, the begin tran ensures that either everything happens or nothing, the serializable isolation level ensures that the results of all reads within a transaction are not affected by other transactions.</p>

<p>Cheers

Sam

Patrick_Greene almost 16 years ago

You just saved me a long evening trying to figure out how to do thisâ€¦ THANK YOU!

Iwan almost 16 years ago

Sam, What are your thoughts on the fact that it needs to perform 2 index seeks? One to check if it exists, then the second one to update?

<p>What do you think of this approach?</p>

<p>

CREATE TABLE iwan_test ( pk int not null primary key, cnt int )
CREATE PROCEDURE sp_iwan_update_stats (@pk INT)

AS

BEGIN
set nocount on
begin

    UPDATE iwan_test SET cnt = cnt + 1 WHERE pk = @pk
    IF @@ROWCOUNT = 0
    BEGIN TRY
        INSERT INTO iwan_test VALUES (@pk, 1)
    END TRY
    BEGIN CATCH
        if ( @@ERROR = 2627 ) -- pk violation
        UPDATE iwan_test SET cnt = cnt + 1 WHERE pk = @pk
    END CATCH
end

END
EXEC dbo.sp_iwan_update_stats 1

EXEC dbo.sp_iwan_update_stats 1

Sam almost 16 years ago

iwan,

<p>(note: the technique you listed will only work on <span class='caps'>SQL 2005</span> and up)</p>


<p>That code is a little risky since there is no transaction defined you can get some strange results.</p>


<p>Say for example you start throwing in deletes randomly, you may get a situation where you will miss a count. The more scary thing is that it may be a little more prone to deadlocks cause you have less control over the locks you are consuming.</p>


<p>I would probably recommend using the second technique I listed cause its a little bit safer, and only seeks once for updates. But to be honest with you, there is little point in splitting hairs. Your way is probably save enough most of the time. I do however strongly recommend you test it for deadlocks, cause there may be a situation where your update statements can deadlock against each other.</p>


<p>Cheers

Sam

Arjan over 15 years ago

very nice article. Works like a charm!

<p>thnx</p>

Sajay over 15 years ago

very Good explanations

Stewart over 15 years ago

Hey Sam I not a Sql DBA, just a mere mortal web developer who writes stored procedures in addition to everything else, so transactions are a bit out my turf. With that in mind, you examples above just about gets me out of a problem.

My situation is this: I need to do conditional inserts or updates for an an unknown number of rows. Basically I need to select rows from 1 table and conditionally insert or update them in another table to achieve parity. It seems like I need a loop to do that, but I have read that loops in Sql are ill-advised.

So my question is should the loop be outside in code (c# in my case) and just call the stored proc recursively, or should the loop be in the stored proc?

If it's the later, can you provide a simple syntax example?

Any guidance would be much appreciated

Sam Saffron over 15 years ago

@Stewart,

Careful, I would strongly recommend against looping in c# code or calling a proc in a loop.

These patterns can be adjusted to work on sets and you could perform all of your work in 2 or 3 sql statements. Try posting your question on stackoverflow.com and posting me a link to your question (make sure you have the schema of all the tables involved specified and your goals.)

Cheers
Sam

Bin_Or over 15 years ago

Thank you, very helpful! I edited my answer on Stack Overflow by adding a link to yours.

and found a little text bug:
During the execution of the select statement we would like to block [the] execution [of] the same select statement on other threads.

Sam over 15 years ago

Thanks binOr,

I corrected the phrasing.

Ani over 15 years ago

Nice article.

One doubt !

Why do we need and updlock and serializable both in this statement ?

â€œif exists (select * from t with (updlock,serializable) where pk = @id)â€

As Sql Books online say â€¦ Updlock holds a lock till the transaction is finished and serializable also fine tunes a shared lock to be held till the transaction finishes.

Can this behaivour not be achieved by only using an Updlock ?

Is there something that makes this â€œKeep in mind the serializable hint is criticalâ€ necessary ?

Sam Saffron over 15 years ago

@Ani,

There is a method to this madness, the problem with the update lock only is that it does not fake a critical section for situations where a key does not exist in the table.

So, for example if you have two transactions running:

begin tran
select * from t with (updlock)
where a = â€˜missing key'

Both will proceed (test it out in QA)

On the other hand if you have an arbitrary number of transactions running:

begin tran
select * from t with (updlock,serializable)
where a = â€˜missing key'

Only the first one will get through the rest will wait for the first transaction to finish.

Paul over 15 years ago

Came across this article yesterday as i am having exactly this UPSERT issue and thought excellent, the only difference being that i have a comma separated list of ID's being passed into a SP instead of a single value.

I therefore came up with the following solution, where

csvSplit is a table based function and

@ResultsPerPage is the number of comma separated values being passed in.

BEGIN TRANSACTION
UPDATE MYTABLE WITH (SERIALIZABLE)
       SET Views = Views + 1
WHERE ID IN (SELECT Value FROM csvSplit(@CommaSeparatedIDs,',') )
IF @@ROWCOUNT != @ResultsPerPage
BEGIN
   INSERT  INTO MYTABLE ( ID,Views )
   SELECT  Value,1
   FROM    csvSplit(CommaSeparatedIDs, ',') WHERE Value NOT IN (SELECT ID FROM MYTABLE)
END
COMMIT TRANSACTION

However, when made LIVE on an extremely busy site i still got Transaction deadlock errors and so rolled it back to the previous solution which is generating the PK Violation errors.

Is there anything obvious that i am missing, or is it simple that using a transaction with serializable hint on an extremely busy site simple isn't the way to go?

Any comment would be greatly appreciated

Cheers Paul

Sam over 15 years ago

Paul,

Sure, the pattern is designed to work on single rows, if you need to operate on arbitrary sets the only way to completely eliminate the possibility of getting deadlocks would be to chuck a tablockx on the whole table, the problem is that you would completely kill concurrency.

You could re-design a proc to do one transaction per row and eliminate the deadlocks, or you could keep handling this in code (but if you are noticing a massive number of retries I would recommend moving to the one row at a time approach).

Cheers
Sam

Zla almost 15 years ago

Is the standard pattern that uses
if exists (select * from t where pk = @id) â€¦
used for natural primary keys in these examples?

My system has mostly artificial primary keys that are identity columns. So the client application either has a value for the primary key or doesn't. So when the stored proc is called to insert the data, the @PK variable is either null or some value. Since the PK is an identity, there isn't any question about whether or not the record exists.

Am I missing something or is my situation different?

Sam Saffron almost 15 years ago

your situation is different, but it could be adapted to these patterns, I would strongly recommend an unique index on the composite

Jeff_Sternal over 14 years ago

If you incorporate the test into your update statement, is that executed atomically?

update t 
set    hitCount = hitCount+1
where  pk       = @id
and    exists (select * from t where pk = @id)

Dallas about 13 years ago

Hi Sam,

This is a great technique for providing safety.

However, to optimize for scenarios where you're expecting the EXISTS() function to return TRUE for the majority of cases, you might want to try a pattern known as â€˜double checked locking'.

So first you run a standard IF EXISTS() case without ANY locking whatsoever, and if you get a TRUE result you know it's always safe to UPDATE your row.

If however you get a FALSE, then you take out the UPDLOCK within a transaction and test again safely using your method. (hence the name double checking).

This is especially important under high load where locking unnecessarily is too expensive.

Shripathi_Kamath about 13 years ago

Dallas, in the optimization you talk about, is it the case that you do not do DELETEs?

If not, how can you ensure that the UPDATE you do post the lockless EXISTS is safe?

Dallas about 13 years ago

Hey Shripathi Kamath.

You are absolutely correct that deletes would not be safe in the approach I suggested. For that, use the same pattern that Sam outlines.

I should have made that clear.

There's a wiki article that explains the drawbacks for youâ€¦ http://en.wikipedia.org/wiki/Double-checked_locking

Honestly, I haven't DELETE'd an object in a production environment since the 90's. The usual pattern is to just â€˜mark' a row as â€˜deleted/unpublish/inactive'. But for the web at least, even out of scope data lives forever.

Sam Saffron about 13 years ago

Agree with you on tombstoning , DELETEs are invented by the devil

Paulo_Rosa almost 13 years ago

Hi everyone,

Sorry if I’m a junior, but why don’t start by updating the table and if no record was affected then you can insert the new data?
You will do one index seek to attempt the update, even with multiple threads a new key will be generated and that voids pk violations if a new record must be added, if it fails here you should review your code. Anyway or you have a pk or must create one.

Daniel_Adeniji over 12 years ago

Thanks for posting this. It is really a good and informative discussion point.

Fernando almost 12 years ago

Hi, I just wanted to thank you for this great post. I faced this problem several times and have never fully understood it till now (why I am getting a deadlock!?). In websites, when you host the site in more than one web server, app locks don't seem to be a solution.

Sam Saffron almost 11 years ago

That is basically the same pattern I demonstrated. It works fine.

Awer Muller about 10 years ago

Hi Sam,

I know this is old but it comes up in google when you look up for this kind of problem, and it’s the only solution that seems to work (it’s been also posted in StackOverflow and some other places).

Anyway, you started the post stating that since the introduction of MERGE, there’s no more need for this patterns. I guess that by now, a lot of people is running SQL Server 2008 or newer, so I’d like to kindly ask if you could elaborate a little about using MERGE for solving this exact same situation.

Thanks in advance for any help on this.

Best regards,
Awer

Sam Saffron about 10 years ago

Sorry @awermuller I have not worked with SQL Server for a few years now, I would not be the best person to elaborate on it.

Sam Saffron

Insert or Update pattern for Sql Server

Comments

Twitter updates

Stack Overflow

Online content

About me