Deadlocked

over 6 years ago

I was reading this article about some recent deadlocking issues Jeff Atwood was having with the brand new Stack Overflow site. In this article Jeff concluded that the WITH (NOLOCK) hint is a practical solution to many deadlocking conundrums which though, in theory, is very dangerous, in practice, solves his problem.

I think this is almost always the wrong conclusion. I think that in practice it is very dangerous.

Why SELECT statements sometimes deadlock against UPDATE statements

The theoretical explanation is this: regardless of explicit transactions SQL Server tries to stay Atomic, Consistent, Isolated and Durable. The default isolation level for all SELECT statements is READ COMMITTED. To achieve this SQL Server uses locks

SELECT statements acquire shared locks on various resources. These resources may be pages in the database where the data is stored or keys in indexes and so forth. These locks may or may not be held for entire duration of the SELECT statement depending on various circumstances. Most of the time MSSql needs to acquire more than one lock to proceed with a portion of the SELECT statement. For example, it may need a shared lock on an index AND a lock on a page in the database to return results. In such cases MSSql determines the order of locks it needs to acquire and acquires them in a what appears to be linear fashion.

And this is where all the trouble starts. Say we have 2 database connections:

# Connection 1 : UPDATE statement acquire an exclusive lock on a key on index #1
# Connection 2 : SELECT statement acquires a shared lock on page #1 in the database
# Connection 1 : UPDATE statement attempts to acquire an exclusive lock on page #1, since someone else is already holding a shared lock, it starts waiting.
# Connection 2 : SELECT statement tries to acquire a shared lock on index #1, since someone else is already holding an exclusive lock, it start waiting.
# MSSql figures out that we have a deadlock, kills the SELECT statement and raises an error message.

If you can't believe this is possible here is a demo:

Run the following code snippet in Query Analyzer:

create table posts (id int identity primary key, [content] varchar(7000), [group] int, date_changed datetime) 
create index idx_date_changed on posts ([group], date_changed) 

insert posts values ('post contents', 1, getdate())  

declare @i int 
set @i = 1
while @i < 5000
begin
 insert posts values ('post contents', @i, getdate()) 
 set @i = @i + 1 
end 
</code>
</pre>

Open two Query Analyzer windows, in the first type: 

<pre><code>set nocount on
declare @i int 
set @i = 1
while @i < 500000
begin
        -- use a temp table to avoid filling the query analyzer window with results 
	select * into #t from posts 
	where [group] = 1 
	drop table #t 
end

In the second one:

update posts 
set [date_changed] = getdate() 
where id = 2

Start the first query. Execute the second query a few times (it may be once, may be 20 times). Look at your your messages on the SELECT loop, you should see something like the following:

Msg 1205, Level 13, State 51, Line 7
Transaction (Process ID 57) was deadlocked on lock resources with 
another process and has been chosen as the deadlock victim. 
Rerun the transaction.

So, you may ask yourself whats going on.

First lets look at the execution plans:

select * from posts where [group] = 1

The execution plan for the SELECT statement reveals that we will first look up some records in the idx_date_changed index and then look up more data in the clustered index and join it up. The key here is order we are looking up data in the two indexes.

Next we should look at the execution plan for the UPDATE statement:

update posts set [date_changed] = getdate() where id = 2

The execution plan for the UPDATE statement reveals we first perform a clustered index seek on the primary key and then an index update on the idx_date_changed index.

Notice that both queries look up the data in the indexes in the reverse order, which gives us more chance of having deadlocks.

A handy trick when debugging deadlock issues is determining which locks each statement acquires. To do this you can hold the locks and wrap it in an open transaction

begin tran 


update posts with (holdlock) set [date_changed] = getdate() where id = 2

Then in a second window you can execute sp_lock which returns the list of active locks.

Remember to commit that transaction later on...

In real life the first step is determining the two statement that deadlock, so you should read the following KB

What can happen if you use the NOLOCK hint

Well the worst case scenario is that you may, once in a while see phantom data, duplicate data or have a bunch of missing rows. In a most web apps this can be acceptable, but it can look very unprofessional.

If you used the NOLOCK hack in a banking application you would probably get fired on the spot.

A real example could be some database maintenance that requires a big fat transaction that say removes all the rows from a a table and then adds the rows back in. In this case, users may get a blank front page on your website. There are lots of less subtle issues that may pop up in the e-commerce world such as: billing a person twice, billing the wrong amount, adding a wrongly priced item to a basket etc...

So how do you really fix these kind of issues.

The cowardly yet honest answer to this is: "it depends". First things first. Reproduce the problem in an isolated test harness so you have something to work with. Without this you will be shooting in the dark.

Here are a few approaches that may help.

  • Optimizing Queries

Are you joining to an unnecessary table? Are you returning too much data (too many columns or rows)? Is your query performing table scans? Can you restructure the query in such a way that it no longer deadlocks? Are you using a left join where you should be using a join? Have you reviewed your NOT IN clauses?

  • Optimizing Indexes

This ties in to the previous point, having too many, too few or wrong indexes can hurt performance. Are there any missing or superfluous covering indexes? Are there any missing indexes? Are there any duplicate indexes?

  • Handling the deadlocks

You may not be able or willing to resolve every deadlock you have. In such cases consider handling the deadlocks in the app, logging the fact it happened and retrying. These logs will be a goldmine when you go about designing your next version.

  • Caching

Perhaps you are executing a lots of the same queries very frequently. Cutting down the number of times that query executes will reduce the chances of deadlocks occurring. This may or may not create issues with you serving out stale data depending on the caching architecture. The caching can be implemented either in the database (using temp tables or tables) or in the application; in-memory or on disk. It really depends on the problem. In web apps you may sometimes want to cache whole pages on disk or in memory.

  • Application architecture review

An application architecture review may point out that a query that is deadlocking is being called 10 times per request when it should only be called once. It may reveal that the data is not really required or that the same feature is implemented efficiently in a different code branch.

Final words.

In Jeff's post he says: "... in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems.". Well, I don't think so. Adding nolock anywhere may be hiding a fundamental design flaw in your software. It may cause very weird bugs. nolock should be avoided whenever possible and turning to snapshot isolation may not always be an option or a solution to the problem.

Comments

Jo over 6 years ago
Jo

Excellent article I hope Jeff reads it!

Jon_Raynor over 6 years ago
Jon_Raynor

Agreed. No lock works, but it is the easy way out. I think Jeff's issue is a query or index issue although I am not privy to the details.

Robert_S about 6 years ago
Robert_S

In your example you show that it's trivial to reproduce in high load situations. Everything you propose is just a reduction in the likelihood of it happening, not an actual fix.

Do you suggest that we code all our selects to just keep trying until it works?

I've got to think that that would be much harder on the DB than NOLOCK or an READ UNCOMMITTED isolation level.

Sam Saffron about 6 years ago
Sam Saffron

Robert,

On one hand this issue trivial to reproduce, but on the other hand, in the real world, you probably will not be selecting the same data from the same table so aggressively (100s of times a sec).

This particular situation can be resolved in quite a few ways

We could amend the index so it does not cover a column which we intend to update on a regular basis. If the date_changed column is removed from the index the deadlocks should go away.

We could cover all the columns we intend to select aggressively in an index and then all the data could be retrieved from the index.

The key is that when you start getting deadlocks you should try to figure our why you are getting deadlocks. I also think that the NOLOCK hint is no silver bullet and in general should only be prescribed as a last resort.

Not all SELECT statements you write are even prone to deadlocking against UPDATE statements. It really depends on the SELECT statement the UPDATE statement, indexes etc. But you should always keep in mind that any SELECT statement that needs more than one lock as once, can deadlock against an intelligently designed transaction which grabs the locks in reverse order.

Do I suggest you code retry logic for every SELECT statement you have? I can't answer that question cause I have no idea how your application is designed and what the impact is of you getting a deadlock. If you are in a situation where a SELECT blowing up is going to crash an airplane, then sure I suggest you put some retry logic next to every SELECT statement. If deadlocks show up once a year in a background report you run hourly, well it may not be worth fixing

Yes retry logic is harder on the DB then NOLOCK. In fact, SERIALIZABE is harder on the db than READ COMMITTED. READ COMMITTED is always harder on the db than READ UNCOMMITED. Its a delicate balancing act between concurrency and consistency. There are many ways of designing an app so it remains both concurrent and consistent.

Secret_Geek about 6 years ago
Secret_Geek

Also this snippet from your post is a great little tip all by itself:

A handy trick when debugging deadlock issues is determining which locks each statement acquires. To do this you can hold the locks and wrap it in an open transaction

begin tran
update posts with (holdlock) set [date_changed] = getdate() where id = 2

Then in a second window you can execute sp_lock which returns the list of active locks.

Diego about 6 years ago
Diego

Excellent and very helpful article. Thank you!

Ludo over 5 years ago
Ludo

Setting :


SET TRANSACTION ISOLATION LEVEL SNAPSHOT


would avoid the deadlocks. Does anybody have an opinion about how this impacts performance?

Amit about 5 years ago
Amit

Thanks, That's an excellent article about a painful issue.

Thanks for sharing!

Vishal about 5 years ago
Vishal

Thanks,


its really gr8 somebody had intitated and try to solve dead lock issue.

Yaron about 4 years ago
Yaron

very good article.

i read somewhere that in a good app design we should access the the DB tables in the same order. i can't understand how can we do that, for example:


void workerThreadFunc()

{

if (condition)

selectFromTable1()

else

updateTable1()

}


if this function used by multiple threads we should have some deadlocks problems, yes?

does it mean that in my store procedure i should write the join statements always in the same order?


Thanks in advance,

Yaron.


comments powered by Discourse