That annoying INSERT problem, getting data into the DB

about 13 years ago

One of the top questions in the Dapper tag on Stack Overflow asks how to insert records into a DB using Dapper.

I provided a rather incomplete answer and would like to expand on it here.

Many ways to skin an INSERT

Taking a step back let’s have a look at how other ORMs and micro ORMs handle inserts. Note, I am using the term ORM quite loosely.

Assuming we have a simple table:

Products (
   Id int identity primary key,
   Name varchar(40), 
   Description varchar(max), 
   MoreStuff varchar(max) default('bla'))

This example may seem trivial, it is however quite tricky. The ORM should “know” that Id is special, it should not try to insert stuff there AND it should try to give the Id back to the consumer. Possibly, the ORM should be aware that MoreStuff has a default and it should yank out after the insert.

Let’s compare how a few toolkits fair:

###Massive

Since Massive is dynamic, there is very little wiring you need to do up front:

public Products(string cnnName) : 
    base(cnnName, primaryKeyField: "Id")
{
}

Then to insert you do:

var row = Products.Insert(
      new {Name = "Toy", Description = "Awesome Toy"});

The returned row is populated with the data you passed in, additionally row.Id is populated with the value of @@identity (which really should be scope_identity())

This approach has quite a few advantages. Firstly, it does not muck around with your input by setting fields in a side effect. It also copes with the “Default column” problem by simply pretending it does not exist in the dynamic row. I think returning an error is preferable to returning incorrect info.

###PetaPoco

The first of the static typed options I will look at is PetaPoco

The class definition is pretty simple:

[TableName("Products")]
[PrimaryKey("Id", autoIncrement = true)]
class Product
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public string Other { get; set; }
}

The insert API is pretty simple as well:

 var p = new Product { Name = "Sam", Description = "Developer" };
 object identity = db.Insert(p);

PetaPoco “knows” not to insert the value 0 into the Id column due to the decoration, it also decides to pull in the .Id property after the insert.

It populates the Other column that had a database default with null, which is probably not what we wanted.

Various DBs deal with sequences in different ways, SQL Server likes to return decimals, others, ints. An unfortunate side effect of this is that the API returns an object.

###Entity Framework

Entity Framework is giant, it has a feature for every user, so this task should be quite simple, our POCO:

public class Product
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public string Other { get; set; }
}

Our context:

class MyContext : DbContext 
{
     public DbSet<Product> Products {get; set;}
}

The insert:

var p = new Product { Name = "Sam", Description = "Developer" };
var ctx = new MyContext();
var p2 = ctx.Products.Add(p);
ctx.SaveChanges();

This API is rather simple, however is a bit odd. The .Add simply returns the input it was given. When you finally call SaveChanges both p and p2 will be “changed” to include the Id which is yanked in from the underlying table joined on SCOPE_IDENTITY. There is an annotation called DatabaseGeneratedOption.Computed that can explain our column is computed on the DB side, however there is no annotation that can explain it is “optionally” computed on the db side.

The trivial Dapper implementation

The simple way to do this with Dapper is:

var p = new Product { Name = "Sam", Description = "Developer" };
p.Id = cnn.Query<int>(@"insert Products(Name,Description) 
values (@Name,@Description) 
select cast(scope_identity() as int)", p).First();

Nothing is hidden from us, no side effects “rewrite” our product object. It is however pretty verbose. We are stuck writing SQL we could probably generate.

If we want to be “tricky” and solve the default column problem, we can:

var p = new Product { Name = "Sam", Description = "Developer" };
var columns = cnn.Query(@"insert Products(Name,Description) 
values(@Name,@Description)
select Other,Id 
from Products where Id = scope_identity()            
", new {p.Name, p.Description }).First();

p.Id = columns.Id;
p.Other = columns.Other;

The ORM cracks are starting to show

ORMs are very opinionated:

Should records be queued up for insert, or inserted on the fly?
Should the “context” object refer to a table (like massive) or a database like PetaPoco and EF?
Should the insert helper change stuff on the object you pass in?
Should you define table metadata using attributes or constructors? If you are using attributes, do they belong on the class or properties?
Should you deal with every little edge case?
Should you define a connection string in the application config? Should you be able to pass an open connection to the context constructor? Should the context object have lots of constructors? And so on.
Should you handle the ability to save graphs of objects in an ordered fashion?

In general the big ORMs try to please everyone, so you need a manual … a BIG manual … to deal with all this minutiae. Even tiny micro ORMs make a ton of decisions that may, or may not, be the ones you would make.

In a recent Hanselminutes Scott asked my colleague Demis: “Why is it that users shy away from frameworks Microsoft build?”. My EF specific answer is this:

It is not just that Dapper is at least twice as fast as EF when it comes to insert performance. It is that the one-size-fits-all approach does not offer a snug fit. I may, or may not care for the EF unit-of-work pattern. There may be other features in EF I do not like. I may have given up on LINQ as an interface to my DB, for example.

When ORMs compete, they often use a big table to describe every feature they have that the competition lacks. Unfortunately, this is a big driver for bloat and complexity, it leaves us developers confused. EF 4.3 now adds migrations, not only is EF offering two prescriptions for data access (code first vs “traditional” designer driven), it is also offering a prescription for data migration. I dislike this monolithic approach. I should be able to pick and choose the pieces I want. Microsoft should be able to rev the migration system without pushing a new version of EF. The monolithic approach also means releases are less frequent and harder to test.

When I choose <insert gigantic ORM name here>, I am choosing not to think. The ORM does all the thinking for me, and I assume they did it right. I outsourced my thinking. The unfortunate side-effect of this Non Thinking is that I now have to do lots of manual reading and learning of what the right way of thinking is.

The Dapper solution

In the past I “sold” Dapper as a minimal ultra fast data mapper. In retrospect the message I sent was lacking. Dapper is piece of Lego you can build other ORMs with. It is likely PetaPoco could be built on Dapper and not suffer any performance hit. Same goes for EF (except that it would be much faster). Dapper is like Rack for databases. It covers all the features ORMs need, including multi mapping, stored proc support, DbCommand reuse, multiple result sets and so on. It does it fast.

I built lots of helpers for Dapper, I blogged about the SqlBuilder.

Recently, I ported Stack Exchange Data Explorer to Dapper. And no, I was not going to hand craft every little INSERT statement, that is crazy talk. Instead, I wanted an API I like:

public class Product
{
   public int Id { get; set; }
   public string Name { get; set; }
   public string Description { get; set; }
   public string Other { get; set; }
}

public class MyDatabase : Database<MyDatabase>
{
   public Table<Product> Products{get;set;}
}

var db = MyDatabase.Init(cnnStr, commandTimeout: 100);

var product = new Product{Name="Toy",Description="AwesomeToy"};
product.Id = db.Products.Insert(new {product.Name, product.Description});

My interface is very opinionated, but the opinions are the best kind of opinions, my opinions. It works, it is simple. With just a few lines of code I could mimic any of the APIs I presented earlier. I chose not to solve the “default column” problem simply cause it was not a problem I had.

The whole implementation of this micro ORM is a single file. It is ultra fast (very close to raw Dapper performance) it uses IL generation in one spot. I do not intend for this little implementation to be “mainstream”. Instead, I hope that people borrow concepts from it when it comes to CRUD using Dapper or other little ORMs. If you want to have a play with it, I chucked it on nuget.

There are now 2 other APIs you can choose from as well (for CRUD) Dapper.Contrib and Dapper Extensions.

I do not think that one-size-fits-all. Depending on your problem and preferences there may be an API that works best for you. I tried to present some of the options. There is no blessed “best way” to solve every problem in the world.

For example, if you need an ultra fast way to insert lots of stuff into a SQL DB, nothing is going to beat SqlBulkCopy and you are going to need a custom API for that.

The UPDATE problem is a bit more tricky, I hope to write about it soon.

Posted by: Sam Permalink | Comments (12)

Comments

Andrei_Rinea about 13 years ago

I LOVE IT! I developed something very simillar to Dapper on my own and it feels so good seeing others with the same opinions, as you put it, the best kind of opinions! :))

Excellent. Now, I can lean on Dapper if something goes wrong with my implementation.

Nick about 13 years ago

Enjoying reading the post and your opinions.

Horst about 13 years ago

Thank you very much!

Root85 about 13 years ago

Thank you for the post. After some investigations I would like to share some facts with you. Dapper INSERT with explicit query like

var user = new User { Name = "Test", GroupId = 1 };
conn.Execute("INSERT [Users] (Name, GroupId) VALUES (@Name, @GroupId)", user, tran);

is the fastest ways to insert.

DapperExtensions and PetaPoco inserts, when you do not write SQL query explicitly, take almost the same time as Entity Framework 4.2. Dapper explicit query above is 3-5 times faster.

So I am going to use EF to make inserts, and Dapper to make selects.

Sam Saffron about 13 years ago

I can not respond on behalf of dapper extensions (not mine) nor can I respond on behalf of PetaPoco (not mine) … but … dapper rainbow is pretty much as fast as dapper. It caches heavily, please link to your tests so the various authors can correct the libraries.

Root85 about 13 years ago

Hi Sam,

It seems that overhead of various ORMs heavily depends on number of iterations too. My previous post was for small amount of operations. I described complete performance tests in my blog.

http://gosheg.blogspot.com/2012/01/microorm-performance-investigations.html

Georgiy

Ted_Spence over 12 years ago

Hey there,

I used to use SqlBulkCopy too, but of course it has lots of fragmentation problems. If you specify the columns in the wrong order your code crashes and all that.

I happened to like using table parameter inserts, detailed here: http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/

It has the advantage of being quite fast, plus it allows for flexibility in the insert statement. The downside is of course that it only works on SQL Server 2008.

Ronnie_Overby about 12 years ago

Great article.

Since you brought up bulk inserting, I thought I'd share a class that I wrote that has worked well for me: https://github.com/ronnieoverby/RonnieOverbyGrabBag/blob/master/BulkInserter.cs

Ernie_Paschall about 12 years ago

I could not agree with you more. I have been looking at the ORM’s for about a year now. I have leaned toward PetaPoco for recent projects but I know Dapper is great too.

This is a great article with lots of excellent points.

Thanks! Ernie

Roger Davies about 10 years ago

Hi, this source code doesn’t work, it throws the following exception (runtime)

“Cannot insert explicit value for identity column in table ‘(tablename)’ when IDENTITY_INSERT is set to OFF.”

(which would be great except that I haven’t actually specified identity at all! I just created an object as in the code here without specifying Id but completing the other parts - of course idx is an Int32 object so defaults to 0, and Peta Poco is so mindless it then passes this via the SQL query - of course it cannot explicitly update record 0 and hence SQL throws exception.

It’s driving me mad - I’ve looked everywhere, this code, like all the other examples throws this exception - any idea what I need to do!?

Thanks,

Sam Saffron about 10 years ago

Recommend you try asking your question on Stack Overflow.

over 7 years ago

Just adding this here for those who use Linq2Db:

[Table(Name = "sometable")]
class SomeEntity
{
    [Column(Name = "ROWID"), Identity]
    public string RowId { get; set; }
}

So all you need to do is to decorate the property with the Identity attribute and viola, when you call db.Insert(someEntity) the decorated property will be ignored.

Sam Saffron