One Step To Understand Relationship Of Primary Key And Non-Clustered Indexes.

Keys and indexes are key fundamental concepts of any RDBMS and play very crucial rule to design good and efficient database. If you have to design performance oriented database then you should have to implement keys and indexes in proper manner in your database.

Developers often do not specify which column should have clustered index. As we know by default Primary Key becomes Clustered Index but vice versa is not true that means you can create clustered index on non primary key column also. This practice is now extremely common and lots of people have forgotten that Primary Key and Clustered Index are two different things.

Another common misconception most people have is “Creating a primary key on a table makes reads fast”.

I want to clarify, Primary key is a constraint, and it has nothing to do with fast reads. “Primary Key” behind the scenes creates a clustered Index which makes the reads fast. Optionally we can also create non-clustered for primary key which contributes to faster reads.

One common and simplest question asked by any interviewer that “what is a Primary key” and most of us reply very quickly with following facts:

  • The primary key constraint uniquely identifies each record in a database table.
  • Primary keys must contain unique values.
  • A primary key column cannot contain NULL values.
  • Most tables should have a primary key, and each table can have only ONE primary key.

But there are few more questions comes in mind about the primary key and indexes as per below:

  1. Can we create primary key and clustered index two different columns?
  2. Is it mandatory to create clustered index on primary key?
  3. Can we create non-clustered index on primary key?

These are the question that creates the confusion and most of the candidates are not confident to say anything.

First of all we have to remember below statement and always keep in mind.

“Primary Key can be clustered or Non-clustered but it is a common best practice to create a Primary Key as Clustered Index. “

In short we can create primary key and clustered index in two different columns and it is not mandatory to apply both of things in same column.

It is not also mandatory to create clustered index on primary key and clustered index can be created on non primary column.

Yes definitely non-clustered index can be created on primary key.

There are following scenario about primary key and Clustered Index:

1. Primary key default to Clustered Index:

This is common scenario in which developers always create primary key and clustered index automatically apply on primary key column. There is no explicitly need to create clustered index on primary key column.

This is a main reason that developers not aware that primary key and clustered index are two different things.

1

You can see in above output result that clustered index is created automatically whenever primary key applied but always remember that vice-verse is not true.

2. Primary key default to Non-Clustered Index:

This is not mandatory that primary key and clustered key is co-related to each other and clustered index only created on primary key. In this scenario we will explicitly defined Primary Key as a non-clustered and it will create it as a non-clustered index.

You can clearly identify that non-clustered index is created on primary key and stored in Heap.

 2

Note: Heaps (Tables without Clustered Indexes) A heap is a table without a clustered index. One or more no-clustered indexes can be created on tables stored as a heap. Data is stored in the heap without specifying an order.

3: Primary Key defaults to Non-Clustered Index with another column defined as a Clustered Index.

In this scenario we will explicitly defined unique Key as a clustered index and another primary key column automatically assign a non-clustered key. It means if any table already have clustered index and primary key is created on the table then by default non-clustered index created on primary key.

3

In above result we can see that no clustered key is created on primary key while clustered key is created on unique key column.

4: Primary Key defaults to Clustered Index with other index defaults to Non-clustered index.

If any table has primary key and unique key and not applied any index explicitly then in this case clustered key is goes with primary key column and non-clustered key goes with unique column.

4

As per the above scenarios we can say that primary key is not strictly bound with clustered key. Developers have to apply both of things as per the situation.

To achieve optimal performance, it is better to go with primary key along with clustered key.

Advertisements
How to choose appropriate data types to create indexes on sql-server tables

How to choose appropriate data types to create indexes on sql-server tables

As we know Indexes are very useful in tables and increases the overall searching performance, But developer also aware about that indexes are like a “Sweet poison” because creating the indexes are not only the solution to increase the performance and sometimes indexes decrease the performance.

So this is a big challenge and confusing task to take decisions that what data type is appropriate to create index on column.Sql-Server provide various data types like “Int, Varchar, Nvarchar,DateTime” etc, but question arise which data type should I choose? What implications might the choice of a data type have for storage and performance in SQL Server?

Now i am going to demonstrate how to select data type for creating the indexes on columns.

Step 1: first we have to create a sample database table having name “Sales” and have to insert approx 200000 records(prefer a large amount of data to test the performance issue).

CREATE TABLE [dbo].[Sales]( [BatchNo] [varchar](50) NOT NULL, [Price] [float] NULL, [DateTransaction] [datetime] NULL, [ProductId] [nvarchar](50) NULL )

after adding the details looks like as per below:

data

Step 2: Create clustered index on column “Product Id” by creating primary key and execute sql command then see the message tab.

table structure

execute below sql command:

set statistics IO on select ProductId,BatchNo,DateTransaction from sales where ProductId= 55005

You can see in below message  by clicking on Message tab from the query output window and it shows that Sql database engine takes 1309 logical reads to find the desired result.

(200000 row(s) affected) Table ‘Sales’. Scan count 1, logical reads 1082, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row(s) affected) you can also see the below image

table structure

Step 3: initially data type of the column “productId” was nvarchar now change data type from “Nvarchar” to integer and save the table. Again run the same query and see the result.

table structure

execute below sql command:

set statistics IO on select ProductId,BatchNo,DateTransaction from sales where ProductId= 55005

but this you can see the difference in output as compare to previous one.this time Sql-server database engine trace only 3 logical reads and below message comes:

(1 row(s) affected) Table ‘Sales’. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

You can clearly see in above result that Sql database engine takes only 3 logical reads while for nvarchar it takes 1309 logical reads.This is a big difference and assume if this searching performs on very large table like table having Lacs of data with complex data type columns then you can really feel better performance and difference. I have recorded this figure in excel and you can see that there are big difference between the performance with two data types. Capture

The main reason behind this, sql engine manipulate faster on numeric values rather than string values and so be careful whenever you have to take decision to create index then always prefer integer type column. So always try to reduce the logically reads by database engine by applying correct indexes and select correct data type.