Deadlocked
over 16 years ago
I was reading this article about some recent deadlocking issues Jeff Atwood was having with the brand new Stack Overflow site. In this article Jeff concluded that the WITH (NOLOCK) hint is a practical solution to many deadlocking conundrums which though, in theory, is very dangerous, in practice, solves his problem.
I think this is almost always the wrong conclusion. I think that in practice it is very dangerous.
Why SELECT statements sometimes deadlock against UPDATE statements
The theoretical explanation is this: regardless of explicit transactions SQL Server tries to stay Atomic, Consistent, Isolated and Durable. The default isolation level for all SELECT statements is READ COMMITTED. To achieve this SQL Server uses locks
SELECT statements acquire shared locks on various resources. These resources may be pages in the database where the data is stored or keys in indexes and so forth. These locks may or may not be held for entire duration of the SELECT statement depending on various circumstances. Most of the time MSSql needs to acquire more than one lock to proceed with a portion of the SELECT statement. For example, it may need a shared lock on an index AND a lock on a page in the database to return results. In such cases MSSql determines the order of locks it needs to acquire and acquires them in a what appears to be linear fashion.
And this is where all the trouble starts. Say we have 2 database connections:
- Connection 1 : UPDATE statement acquire an exclusive lock on a key on index #1
- Connection 2 : SELECT statement acquires a shared lock on page #1 in the database
- Connection 1 : UPDATE statement attempts to acquire an exclusive lock on page #1, since someone else is already holding a shared lock, it starts waiting.
- Connection 2 : SELECT statement tries to acquire a shared lock on index #1, since someone else is already holding an exclusive lock, it start waiting.
- MSSql figures out that we have a deadlock, kills the SELECT statement and raises an error message.
If you can’t believe this is possible here is a demo:
Run the following code snippet in Query Analyzer:
create table posts (id int identity primary key, [content] varchar(7000), [group] int, date_changed datetime)
create index idx_date_changed on posts ([group], date_changed)
insert posts values ('post contents', 1, getdate())
declare @i int
set @i = 1
while @i < 5000
begin
insert posts values ('post contents', @i, getdate())
set @i = @i + 1
end
Open two Query Analyzer windows, in the first type:
set nocount on
declare @i int
set @i = 1
while @i < 500000
begin
-- use a temp table to avoid filling the query analyzer window with results
select * into #t from posts
where [group] = 1
drop table #t
end
In the second one:
update posts
set [date_changed] = getdate()
where id = 2
Start the first query. Execute the second query a few times (it may be once, may be 20 times). Look at your your messages on the SELECT loop, you should see something like the following:
Msg 1205, Level 13, State 51, Line 7
Transaction (Process ID 57) was deadlocked on lock resources with
another process and has been chosen as the deadlock victim.
Rerun the transaction.
So, you may ask yourself whats going on.
First lets look at the execution plans:
select * from posts where [group] = 1
The execution plan for the SELECT statement reveals that we will first look up some records in the idx_date_changed index and then look up more data in the clustered index and join it up. The key here is order we are looking up data in the two indexes.
Next we should look at the execution plan for the UPDATE statement:
update posts set [date_changed] = getdate() where id = 2
The execution plan for the UPDATE statement reveals we first perform a clustered index seek on the primary key and then an index update on the idx_date_changed index.
Notice that both queries look up the data in the indexes in the reverse order, which gives us more chance of having deadlocks.
A handy trick when debugging deadlock issues is determining which locks each statement acquires. To do this you can hold the locks and wrap it in an open transaction
begin tran
update posts with (holdlock) set [date_changed] = getdate() where id = 2
Then in a second window you can execute sp_lock which returns the list of active locks.
Remember to commit that transaction later on…
In real life the first step is determining the two statement that deadlock, so you should read the following KB
What can happen if you use the NOLOCK hint
Well the worst case scenario is that you may, once in a while see phantom data, duplicate data or have a bunch of missing rows. In a most web apps this can be acceptable, but it can look very unprofessional.
If you used the NOLOCK hack in a banking application you would probably get fired on the spot.
A real example could be some database maintenance that requires a big fat transaction that say removes all the rows from a a table and then adds the rows back in. In this case, users may get a blank front page on your website. There are lots of less subtle issues that may pop up in the e-commerce world such as: billing a person twice, billing the wrong amount, adding a wrongly priced item to a basket etc…
So how do you really fix these kind of issues.
The cowardly yet honest answer to this is: “it depends”. First things first. Reproduce the problem in an isolated test harness so you have something to work with. Without this you will be shooting in the dark.
Here are a few approaches that may help.
- Optimizing Queries
Are you joining to an unnecessary table? Are you returning too much data (too many columns or rows)? Is your query performing table scans? Can you restructure the query in such a way that it no longer deadlocks? Are you using a left join where you should be using a join? Have you reviewed your NOT IN clauses?
- Optimizing Indexes
This ties in to the previous point, having too many, too few or wrong indexes can hurt performance. Are there any missing or superfluous covering indexes? Are there any missing indexes? Are there any duplicate indexes?
- Handling the deadlocks
You may not be able or willing to resolve every deadlock you have. In such cases consider handling the deadlocks in the app, logging the fact it happened and retrying. These logs will be a goldmine when you go about designing your next version.
- Caching
Perhaps you are executing a lots of the same queries very frequently. Cutting down the number of times that query executes will reduce the chances of deadlocks occurring. This may or may not create issues with you serving out stale data depending on the caching architecture. The caching can be implemented either in the database (using temp tables or tables) or in the application; in-memory or on disk. It really depends on the problem. In web apps you may sometimes want to cache whole pages on disk or in memory.
- Application architecture review
An application architecture review may point out that a query that is deadlocking is being called 10 times per request when it should only be called once. It may reveal that the data is not really required or that the same feature is implemented efficiently in a different code branch.
Final words.
In Jeff’s post he says: “… in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems.”. Well, I don’t think so. Adding nolock anywhere may be hiding a fundamental design flaw in your software. It may cause very weird bugs. nolock should be avoided whenever possible and turning to snapshot isolation may not always be an option or a solution to the problem.
Excellent article I hope Jeff reads it!