Last post we have seen how to read excecution plan .It was brief post about execution plan.you may have question in your mind how this execution plan is created ?On what basis SQL Server engine generate plan? how many plan gets generated and which plan SQL Engine select?
I will try to make it as simple as .There should be something in sql server which act as meta data to take decision .Yes there is .That something is called as
Statistics.
Now next question can come to in your mind how it gets created and who create it?
Statistics get created in following cases and Sql server engine automatically creates it.
1.When we create any index .
2. Running query using where or join clause so individual statistics will be created first time .
It will be updated later point if auto_update_statistics is ON.
E.g.
CREATE TABLE CREDITCARD
(
[Credit_Card_id] [INT] NOT NULL,
[Name] [VARCHAR](100) NULL,
[Limit] [INT] NULL,
[Valid_from] [INT] NULL,
[Valid_To] [INT] NULL,
)
Go
select * from CREDITCARD
Now we have created table and seeing statistics are created are not.
You will see no statistics are created yet.
Lets run below query ,
SELECT *
FROM CREDITCARD
WHERE Credit_Card_id = 1
AND Name = 'test'
AND Limit = 1
AND Valid_from = 20
AND Valid_To = 30
Now if you refresh statistics you will see five statistics object are created .Lets see how to see
what data is there in that statistics .
We will use DBCC SHOW_STATISTICS to see data inside statistics.
Note : For demo I am considering only one statistics object.Name of your statistic will be different as it creates dynamically.
DBCC SHOW_STATISTICS('CREDITCARD', _WA_Sys_00000001_0425A276)
Now we are seeing statistics but no data is there.Now lets do some insert.Lets insert data and run once above command.
insert into CREDITCARD
select 1,'Shrikant',25000,1,22
Go
SELECT *
FROM CREDITCARD
WHERE Credit_Card_id = 1
AND Name = 'test'
AND Limit = 1
AND Valid_from = 20
AND Valid_To = 30
DBCC SHOW_STATISTICS('CREDITCARD', _WA_Sys_00000001_0425A276).
Now you will see very important information.
- The number of rows and pages occupied by a table's data
- The time that statistics were last updated
- The average length of keys in a column
- Histograms showing the distribution of data in a column
- String summaries that are used when performing LIKE queries on character data
In this third row is very important .In this we can see how many distinct values are there for that column .Ther is one column called EQ_ROW which is very useful for calculating plan.
Lets go bit inside.I am inserting a 10000 record in above table.
declare @lclId int =1
while 1= 1
begin
if @lclId>10000
break
insert into CREDITCARD
select @lclId,'Shrikant',25000,1,22
set @lclId +=1
end
Now If you run following select statement and DBBC command
Go
SELECT *
FROM CREDITCARD
WHERE
Name = 'Shrikant'
AND Limit = 25000
AND Valid_from = 1
AND Valid_To = 22 and Credit_Card_id =1
GO
DBCC SHOW_STATISTICS('CREDITCARD',_WA_Sys_00000001_0425A276 )
you can see vaues for update and EQ_Row.EQ row is used to calculate estimated row count.
Note : You can manually update statistics by using sp_updatestats procedure for entire database.
If you want to update statistics for a table in above case you can use
UPDATE STATISTICS CREDITCARD
If you want to update statistics for a specific statistics in above case you can use
UPDATE STATISTICS CREDITCARD _WA_Sys_00000001_0425A276;
In next post we will try to understand how statistics are useful for performance tuning.