That annoying INSERT problem, getting data into the DB
over 12 years ago
One of the top questions in the Dapper tag on Stack Overflow asks how to insert records into a DB using Dapper.
I provided a rather incomplete answer and would like to expand on it here.
Many ways to skin an INSERT
Taking a step back let’s have a look at how other ORMs and micro ORMs handle inserts. Note, I am using the term ORM quite loosely.
Assuming we have a simple table:
Products (
Id int identity primary key,
Name varchar(40),
Description varchar(max),
MoreStuff varchar(max) default('bla'))
This example may seem trivial, it is however quite tricky. The ORM should “know” that Id
is special, it should not try to insert stuff there AND it should try to give the Id back to the consumer. Possibly, the ORM should be aware that MoreStuff
has a default and it should yank out after the insert.
Let’s compare how a few toolkits fair:
###Massive
Since Massive is dynamic, there is very little wiring you need to do up front:
public Products(string cnnName) :
base(cnnName, primaryKeyField: "Id")
{
}
Then to insert you do:
var row = Products.Insert(
new {Name = "Toy", Description = "Awesome Toy"});
The returned row is populated with the data you passed in, additionally row.Id
is populated with the value of @@identity
(which really should be scope_identity())
This approach has quite a few advantages. Firstly, it does not muck around with your input by setting fields in a side effect. It also copes with the “Default column” problem by simply pretending it does not exist in the dynamic row. I think returning an error is preferable to returning incorrect info.
###PetaPoco
The first of the static typed options I will look at is PetaPoco
The class definition is pretty simple:
[TableName("Products")]
[PrimaryKey("Id", autoIncrement = true)]
class Product
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public string Other { get; set; }
}
The insert API is pretty simple as well:
var p = new Product { Name = "Sam", Description = "Developer" };
object identity = db.Insert(p);
PetaPoco “knows” not to insert the value 0
into the Id column due to the decoration, it also decides to pull in the .Id
property after the insert.
It populates the Other
column that had a database default with null, which is probably not what we wanted.
Various DBs deal with sequences in different ways, SQL Server likes to return decimals, others, ints. An unfortunate side effect of this is that the API returns an object.
###Entity Framework
Entity Framework is giant, it has a feature for every user, so this task should be quite simple, our POCO:
public class Product
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public string Other { get; set; }
}
Our context:
class MyContext : DbContext
{
public DbSet<Product> Products {get; set;}
}
The insert:
var p = new Product { Name = "Sam", Description = "Developer" };
var ctx = new MyContext();
var p2 = ctx.Products.Add(p);
ctx.SaveChanges();
This API is rather simple, however is a bit odd. The .Add
simply returns the input it was given. When you finally call SaveChanges
both p and p2 will be “changed” to include the Id
which is yanked in from the underlying table joined on SCOPE_IDENTITY
. There is an annotation called DatabaseGeneratedOption.Computed
that can explain our column is computed on the DB side, however there is no annotation that can explain it is “optionally” computed on the db side.
The trivial Dapper implementation
The simple way to do this with Dapper is:
var p = new Product { Name = "Sam", Description = "Developer" };
p.Id = cnn.Query<int>(@"insert Products(Name,Description)
values (@Name,@Description)
select cast(scope_identity() as int)", p).First();
Nothing is hidden from us, no side effects “rewrite” our product object. It is however pretty verbose. We are stuck writing SQL we could probably generate.
If we want to be “tricky” and solve the default column problem, we can:
var p = new Product { Name = "Sam", Description = "Developer" };
var columns = cnn.Query(@"insert Products(Name,Description)
values(@Name,@Description)
select Other,Id
from Products where Id = scope_identity()
", new {p.Name, p.Description }).First();
p.Id = columns.Id;
p.Other = columns.Other;
The ORM cracks are starting to show
ORMs are very opinionated:
- Should records be queued up for insert, or inserted on the fly?
- Should the “context” object refer to a table (like massive) or a database like PetaPoco and EF?
- Should the insert helper change stuff on the object you pass in?
- Should you define table metadata using attributes or constructors? If you are using attributes, do they belong on the class or properties?
- Should you deal with every little edge case?
- Should you define a connection string in the application config? Should you be able to pass an open connection to the context constructor? Should the context object have lots of constructors? And so on.
- Should you handle the ability to save graphs of objects in an ordered fashion?
In general the big ORMs try to please everyone, so you need a manual … a BIG manual … to deal with all this minutiae. Even tiny micro ORMs make a ton of decisions that may, or may not, be the ones you would make.
In a recent Hanselminutes Scott asked my colleague Demis: “Why is it that users shy away from frameworks Microsoft build?”. My EF specific answer is this:
It is not just that Dapper is at least twice as fast as EF when it comes to insert performance. It is that the one-size-fits-all approach does not offer a snug fit. I may, or may not care for the EF unit-of-work pattern. There may be other features in EF I do not like. I may have given up on LINQ as an interface to my DB, for example.
When ORMs compete, they often use a big table to describe every feature they have that the competition lacks. Unfortunately, this is a big driver for bloat and complexity, it leaves us developers confused. EF 4.3 now adds migrations, not only is EF offering two prescriptions for data access (code first vs “traditional” designer driven), it is also offering a prescription for data migration. I dislike this monolithic approach. I should be able to pick and choose the pieces I want. Microsoft should be able to rev the migration system without pushing a new version of EF. The monolithic approach also means releases are less frequent and harder to test.
When I choose <insert gigantic ORM name here>
, I am choosing not to think. The ORM does all the thinking for me, and I assume they did it right. I outsourced my thinking. The unfortunate side-effect of this Non Thinking
is that I now have to do lots of manual reading and learning of what the right way of thinking is.
The Dapper solution
In the past I “sold” Dapper as a minimal ultra fast data mapper. In retrospect the message I sent was lacking. Dapper is piece of Lego you can build other ORMs with. It is likely PetaPoco could be built on Dapper and not suffer any performance hit. Same goes for EF (except that it would be much faster). Dapper is like Rack for databases. It covers all the features ORMs need, including multi mapping, stored proc support, DbCommand reuse, multiple result sets and so on. It does it fast.
I built lots of helpers for Dapper, I blogged about the SqlBuilder.
Recently, I ported Stack Exchange Data Explorer to Dapper. And no, I was not going to hand craft every little INSERT
statement, that is crazy talk. Instead, I wanted an API I like:
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public string Other { get; set; }
}
public class MyDatabase : Database<MyDatabase>
{
public Table<Product> Products{get;set;}
}
var db = MyDatabase.Init(cnnStr, commandTimeout: 100);
var product = new Product{Name="Toy",Description="AwesomeToy"};
product.Id = db.Products.Insert(new {product.Name, product.Description});
My interface is very opinionated, but the opinions are the best kind of opinions, my opinions. It works, it is simple. With just a few lines of code I could mimic any of the APIs I presented earlier. I chose not to solve the “default column” problem simply cause it was not a problem I had.
The whole implementation of this micro ORM is a single file. It is ultra fast (very close to raw Dapper performance) it uses IL generation in one spot. I do not intend for this little implementation to be “mainstream”. Instead, I hope that people borrow concepts from it when it comes to CRUD using Dapper or other little ORMs. If you want to have a play with it, I chucked it on nuget.
There are now 2 other APIs you can choose from as well (for CRUD) Dapper.Contrib and Dapper Extensions.
I do not think that one-size-fits-all. Depending on your problem and preferences there may be an API that works best for you. I tried to present some of the options. There is no blessed “best way” to solve every problem in the world.
For example, if you need an ultra fast way to insert lots of stuff into a SQL DB, nothing is going to beat SqlBulkCopy and you are going to need a custom API for that.
The UPDATE
problem is a bit more tricky, I hope to write about it soon.
I LOVE IT! I developed something very simillar to Dapper on my own and it feels so good seeing others with the same opinions, as you put it, the best kind of opinions! :))
Excellent. Now, I can lean on Dapper if something goes wrong with my implementation.