Bulk Insert into a Table Including Primary Key Column in SQL Server
Description:
There have been instances where I have had to restore data from a backup table into
the table where the data originally resided. Consider the following two tables:
CREATE TABLE things
(
things_id INT IDENTITY(1,1) not
null,
name NVARCHAR(255),
CONSTRAINT PK_THINGS
PRIMARY KEY (things_id)
)
CREATE TABLE things_deleted
(
things_deleted_id INT IDENTITY(1,1),
things_id INT not null,
name NVARCHAR(255),
CONSTRAINT PK_THINGS_DELETED
PRIMARY KEY (things_deleted_id)
)
The first table
things is where the original data would reside and the table
things_deleted is the table where the data would be backed up to when records
are deleted from the things table. This tactic, which I don't recommend, is sometimes
used to keep data around in case it needs to be restored.
The Solution:
An easy way to restore the data is to do a bulk insert but because you need to insert
the value for a primary key you must tell SQL Server this is what you intend to
do. If you don't do this SQL Server will return the following message:
Cannot insert explicit value for identity column in table 'things' when IDENTITY_INSERT
is set to OFF
To overcome this we tell SQL Server we want to insert values into the primary key
column by setting IDENTITY_INSERT for the table to ON.
Using our example the following SQL script will restore all of the data back to
the original things table from the things_deleted table:
SET IDENTITY_INSERT things
ON
GO
INSERT things
(
things_id,
name
)
SELECT things_id, name
FROM things_deleted
SET IDENTITY_INSERT things
OFF
GO
When using this technique always make sure to turn the identity insert for the table
you are working on back to off since it is not normal to insert a value explicitly
for a primary key.
Additionally I would not recommend using this technique for removing deleted records
that may need to be restored later. There are several reasons for this, the most
important of which is related to maintenance. Tables often change as they evolve.
New columns are added and dropped regularly as an application grows. With this implementation
you now have two tables to maintain. If the backup table is not changed to capture
this new information then data will be lost. It's also important to realize that
maintenance is the biggest cost of software throughout it's life cycle so anything
you can do to cut down on maintenance pays off as your application matures.
A much more elegant solution would be to add a datetime field named expired
to the things table. That way when searching for records you could just check that
the expired field is null. Besides being easier to maintain and easier to implement
in an application, this technique would require no maintenance when the inevitable
changes to the table are made.