December 2005 - Posts
There are many ways to rebuild your indexes in SQL Server 2000 and I felt like doing some performance testing on the different ways.
A DROP and CREATE is obviously the slowest because you are rebuilding your non clustered indexes twice this way. First he has to change the bookmark to the RID because it is becoming a heap instead of a clustered table and then he has to change the bookmark again to point to the clustering key.
CREATE WITH DROP_EXISTING and DBCC DBREINDEX rebuild the clustered index in one transaction preventing the double rebuild of the nonclustered indexes. When you are rebuilding a unique clustered index the nonclustered indexes aren't being rebuilt at all. The same goes for the CREATE WITH DROP_EXISTING if you are recreating the index on the same key.
Keep in mind that creating a clustered index without specifying the UNIQUE keyword gives the opportunity to insert non unique records. When this happens SQL Server adds 4 extra bytes to every key to guarantee the uniqueness. Another side effect is that nonclustered indexes are being rebuilt because the 'uniqueifier' is regenerated every time you rebuild the index and thus changing the clustering key.
One of the 'benefits' of the DROP_EXISTING method vs the DBCC DBREINDEX is the ability to pass the SORT_IN_TEMPDB option. This forces SQL Server to store the intermediate sort results that are used to build the index in tempdb. Although this may increase the index rebuild performance it does take more diskspace.
SQL Server 2005 introduces the ALTER INDEX statement with a REBUILD option.
Here is a little test script (always recreate the table when testing a reindexing method).
There is no time for the ALTER INDEX because I was testing on SQL Server 2000. But it should be about the same as DBCC DBREINDEX as they are supposed the be equivalents.
SET NOCOUNT ON
IF OBJECT_ID('myTest') IS NOT NULL
DROP TABLE myTest
GO
CREATE TABLE myTest
(myID int,
myChar char(512)
)
GO
DECLARE @i int
SET @i = 0
WHILE @i < 1000000
BEGIN
INSERT INTO myTest (myID, myChar)
VALUES (@i, REPLICATE('Z', 512))
SET @i = @i + 1
END
GO
CREATE UNIQUE CLUSTERED INDEX ci1 ON myTest (myID)
GO
--Create a ridiculous amount of indexes :)
CREATE NONCLUSTERED INDEX ix1 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix2 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix3 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix4 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix5 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix6 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix7 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix8 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix9 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix10 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix11 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix12 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix13 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix14 ON myTest (myChar)
GO
CREATE NONCLUSTERED INDEX ix15 ON myTest (myChar)
GO
--Drop and create - 00:26:39
DECLARE @Start datetime
SET @Start = Getdate()
DROP INDEX myTest.ci1
CREATE UNIQUE CLUSTERED INDEX ci1 ON myTest (myID)
PRINT CONVERT(varchar,GetDate() - @Start , 108)
--Create with DROP_EXISTING - 00:01:23
DECLARE @Start datetime
SET @Start = Getdate()
CREATE UNIQUE CLUSTERED INDEX ci1 ON myTest (myID) WITH DROP_EXISTING
PRINT CONVERT(varchar,GetDate() - @Start , 108)
--DBREINDEX - 00:01:16
DECLARE @Start datetime
SET @Start = Getdate()
DBCC DBREINDEX(myTest, ci1, 0)
PRINT CONVERT(varchar,GetDate() - @Start , 108)
--ALTER INDEX
DECLARE @Start datetime
SET @Start = Getdate()
ALTER INDEX ci1 ON myTest
REBUILD;
PRINT CONVERT(varchar,GetDate() - @Start , 108)
--Clean up
IF OBJECT_ID('myTest') IS NOT NULL
DROP TABLE myTest
GO
I was playing around with the OPTION FAST statement for an update query when I saw a 'weird' behavior.
When using the OPTION FAST SQL Server passed the criteria as a fixed value while without it, it would use a variable because of a conversion.
Then I remembered the following article which states a list of situations where auto-parameterization is not happening and one of it is:
A statement with query hints specified using the OPTION clause.
I was passing a uniqueidentifier between single quotes so apparently the auto-parameterization is storing it as a character variable in the cache.
Another mystery solved :-)
After some playing around with DBCC PAGE, DBCC EXTENTINFO I came to the conclusion that it is indeed the deallocation that causes the process to take page locks (and extent locks when a complete extent is released).
From Inside SQL Server 2000
When the last row is deleted from a data page, the entire page is deallocated. (If the page is the only one remaining in the table, it isn't deallocated. A table always contains at least one page, even if it's empty.) This also results in the deletion of the row in the index page that pointed to the old data page. Index pages are deallocated if an index row is deleted (which, again, might occur as part of a delete/insert update strategy), leaving only one entry in the index page. That entry is moved to its neighboring page, and then the empty page is deallocated.
So I suppose the spid that does the deletes also deallocates the pages and extents and not the asynchronous process that scans for ghosted records
Recently I was playing around with one of our procedures that we use for our reporting solution. One of the requirements was the ability to regenerate data for a certain date. In order to guarantee this we delete the records for a specific day before filling the denormalized tables with data. Because there are a lot of records for 1 day for some reports and we only fill them during the night when there is no activity I thought it would be a good idea to use WITH TABLOCKX because the finer the granularity the more work SQL Server has with managing the locks.
I accidentally came across some strange behavior (imho) when checking the actual locks taken by the process. It seems SQL Server is switching to row and page locks and escalating to table locks at some point even though I hardcoded WITH (TABLOCKX) in the DELETE statement.
Here you can find a little reproduction script:
The funny thing is that the switch to page and row locks is only happening as of a certain number of records (possibly because of the time it takes so some process finishes?)
SET NOCOUNT ON
IF NOT OBJECT_ID('tbltablockTest') IS NULL
DROP TABLE tblTablockTest
GO
CREATE TABLE tblTablockTest
(ID int,
myDate smalldatetime,
myText varchar(1000)
)
GO
DECLARE @i int
SET @i = 1
WHILE @i < 1025
BEGIN
INSERT INTO tblTablockTest
VALUES (@i, getdate(), REPLICATE(CAST(@i as varchar), 1000 / LEN(@i)))
SET @i = @i + 1
END
GO
CREATE CLUSTERED INDEX ixID ON tblTablockTest (ID)
GO
DBCC TRACEON (3604, 1200) --1200 shows detailed information about locking
BEGIN TRAN
DELETE FROM tblTablockTest WITH (TABLOCKX)
WHERE ID BETWEEN 1 AND 15 -- increase the 15 if the behaviour is not showing
ROLLBACK
DBCC TRACEOFF (3604, 1200)
I haven't figured out why this is happening yet but I'm on a mission ;-) One of my guesses is that it has something to do with page deallocation but I haven't got a good explanation yet.