about 8 years ago
Note SQL Server 2008 users, you now have a built-in MERGE statement you can use instead of these patterns.
A very common problem that is surprisingly difficult to solve properly with SQL is the UPDATE or INSERT problem (sometimes called upsert). I would like to insert a row in to the table if the key does not exist and update a row if a key exists.
So, for the table:
create table t (pk int primary key, hitCount int)
The following pattern is very commonly used:
if exists (select * from t where pk = @id) begin update t set hitCount = hitCount+1 where pk = @id end else begin insert into t values (@id, 1) end
The above code works just fine from single threaded applications, however when called from multiple threads, you start getting primary key violations.
This happens because the batch is not executed atomically. Two or more threads could "think" that there is no data for the existing key and subsequently try to insert data.
To overcome this issue you have two options. The first is, handle primary key violations. The second option is to fix up the transactional integrity of the batch.
Lets explore the second option.
The first step is to put all the SQL in a begin tran commit tran block.
begin tran if exists(select * from t where pk = @id) begin update t set hitCount = hitCount+1 where pk = @id end else begin insert t (pk, hitCount) values (@id, 1) end commit tran
This does not solve the problem. Primary key violations keep on happening. The reason for this is that the default isolation level for transactions is "read committed", meaning that you are allowed to read all committed transactions. But new transactions may get committed between the first and second statement.
So, after you read up about transaction isolations levels you may decide to fix up the code and put it in a serializable transaction:
set transaction isolation level serializable begin tran if exists(select * from t where pk = @id) begin update t set hitCount = hitCount+1 where pk = @id end else begin insert t (pk, hitCount) values (@id, 1) end commit tran
After doing this you stop getting primary key violations, however ... you create a much bigger problem. Now you may notice you start getting deadlocks.
Why is this happening? Well, the serializable transaction means the select statement will acquire a shared locks on the table. You will get a RangeS-S lock on the key when the data is missing (and a key / page lock if the data is there). The shared locks are compatible with other shared locks, meaning other threads could acquire the same lock on the same key/page. Later on in the transaction you need to upgrade the lock to an exclusive lock, so you can change the data. However, this can not be done if other transactions are holding shared locks. Which leads to deadlocks, For example:
# Key is missing # Transaction 1 acquires RangeS-S on key 1 # Transaction 2 acquires RangeS-S on key 1 # Transaction 1 wants to upgrade the lock to a RangeX-X (starts waiting) # Transaction 2 wants to upgrade the lock to a RangeX-X (starts waiting) # SQL Server detects a deadlock and kills off one of the transactions
So, we need to clean up the transaction a little more. During the execution of the SELECT statement we would like to block the execution of the same SELECT statement on other threads. A handy hint we can use is the UPDLOCK hint, it means that we will acquire update locks instead of shared locks. This does not block other readers, but will block statements which need to also hold update locks (like update statements or queries using the UPDLOCK hint).
The result of this is that we have created a critical section in our transaction.
So here is the first good way to solve the problem:
begin tran if exists (select * from t with (updlock,serializable) where pk = @id) begin update t set hitCount = hitCount+1 where pk = @id end else begin insert t (pk, hitCount) values (@id, 1) end commit tran
A second method we can use is:
begin tran update t with (serializable) set hitCount = hitCount + 1 where pk = @id if @@rowcount = 0 begin insert t (pk, hitCount) values (@id,1) end commit tran
Keep in mind the serializable hint is critical, without it we will be very prone to deadlocks, for reasons similar to what was described above.
Both of these methods exhibited very similar performance characteristics during my testing. Keep in mind that in a production system they may perform differently.
Thank you Erland for catching out many bugs in the first draft of this article!