Friday, 28 December 2012

7. Database design


In this post, I am going to talk about how database design is very important for any applications.

Database is the backbone for any application.  I am not going to write much theory here and do explain what are the necessary things to remember when we do the database design.

This below diagram describes the overall data modelling steps involved when we do the database design.

DataModelling

Please keep in mind the following database concepts and corresponding usage when database design or working on data base.

Concepts and description:

RDBMS (Relational Database Management System) :
Contains the tables and enforces relationships between the tables.

RDBMS Table Properties:  
  1. Values are atomic.
  2. Each Column must have a unique name
  3. The sequence of rows and columns are insignificant.
  4. Each column must specify same type of values
Normalization & different normalization forms:
The process of organizing data to minimize the redundancy is called normalization.
1st Normal Form: Eliminate repeating groups. Define set of attributes for each table and set the primary key for the each table.
2nd Normal Form: Eliminate redundant data.
3rd Normal Form: Eliminate columns not dependent on the key.
4th Norma Form:  Isolate independent multiple relationships.
Boyce- Codd Normal Form: If there are non-trivial dependencies between candidate key attributes, than separate them out into distinct tables.
5th Normal Form: isolate semantically related multiple relationships.
Note:
  1. First 3 normal forms should be necessarily implemented.
  2. Too much normalization is not good for applications.
  3. There are other normalizations namely Optimal and Domain-Key normal forms
De-normalization:
De-normalization is the process of attempting to optimizing performance of the database by adding redundant data.

Primary and Unique key:
Primary key is the unique identifier of the each row. Primary key prevents duplicate values. Primary key does not allow NULL values.
Unique + Not NULL = Primary Key
Unique key enforces the uniqueness of the values in the set of columns. Unique key allow on NULL value.

Candidate key:
The key which is become primary key is called candidate key.

Foreign key:
Foreign key enforces the referential integrity.  Foreign key in one table points to primary key or unique on another table.

ACID Database characteristics:
ACID: Atomicity, Consistency, Isolation, Durability. For more information, please refer http://en.wikipedia.org/wiki/ACID

Store Procedure and advantages:
Store procedure is the group of SQL statements that have been previously created and stored in the SQL Server. This is pre-compiled.
Advantages:
  1. Reduce the network traffic.
  2. More secure and prevents the SQL injection.
  3. Speed and Modularization.
Trigger:
Trigger is the SQL procedure or SQL code which fire the action (When INSERT OR DELTE OR UPDATE on database). For more information, please refer http://en.wikipedia.org/wiki/Database_trigger

View:
View is the virtual tables and does not occupy any space. Views can be used for securing the columns fields. Please refer http://en.wikipedia.org/wiki/View_(database) for more information.

Index:
Index is used for fast data retrieval. Please refer http://en.wikipedia.org/wiki/Database_index for more information.

Cursor:
Cursor is a database object used by applications to manipulate data in a set on a row-by-row basis. Please refer the http://en.wikipedia.org/wiki/Cursor_(databases) for more information.

SQL Types:

Refer below diagram.
SQLTypes

What is the difference between UNION and UNION ALL?

UNION:
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.

UNION ALL:
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.

What is difference between DELETE & TRUNCATE commands?
Delete command removes the rows from a table based on the condition that we provide with a WHERE clause. Truncate will actually remove all the rows from a table and there will be no data in the table after we run the truncate command.

TRUNCATE:
TRUNCATE is faster and uses fewer system and transaction log resources than DELETE.
TRUNCATE removes the data by deallocating the data pages used to store the table’s data, and only the page deallocations are recorded in the transaction log.
TRUNCATE removes all rows from a table, but the table structure, its columns, constraints, indexes and so on, remains. The counter used by an identity for new rows is reset to the seed for the column.
You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint. Because TRUNCATE TABLE is not logged, it cannot activate a trigger.
TRUNCATE cannot be rolled back.
TRUNCATE is DDL Command.
TRUNCATE Resets identity of the table

DELETE:
DELETE removes rows one at a time and records an entry in the transaction log for each deleted row.
If you want to retain the identity counter, use DELETE instead. If you want to remove table definition and its data, use the DROP TABLE statement.
DELETE Can be used with or without a WHERE clause
DELETE Activates Triggers.
DELETE can be rolled back.
DELETE is DML Command.
DELETE does not reset identity of the table.

What is difference between a primary key and a unique key?
Both primary key and unique key enforces uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a non-clustered index by default. Another major difference is that, primary key doesn’t allow NULLs, but unique key allows one NULL only.
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

SQL Joins:
Please refer the below diagram
SQL Joins


Linked Server:
Linked Server option enables to execute the commands against the OLE DB data sources on remote servers.

Locking:
Locking is a method used to protect records that will be accessed by multiple users and its prevents the concurrency errors.
There are two types of locking 1. Optimistic 2. Pessimistic
In pessimistic locking, when user wants to update data it locks the record and till then no one can update data. Other user’s can only view the data when there is pessimistic locking.
In optimistic locking multiple users can open the same record for updating, thus increase maximum concurrency. Record is only locked when updating the record. This is the most preferred way of locking practically. Now a days in browser based application it is very common and having pessimistic locking is not a practical solution.

Please refer
for more about locking, how you prevent concurrency problems and other useful information.

What is the difference between having and where clause?
  1. Where clause cannot be used with aggregates, but the having clause can.
  2. Having specifies a search condition for a group or an aggregate function used in SELECT statement.

Filtered index:
It uses a filter predicate to index a portion of rows in the table. Please refer http://technet.microsoft.com/en-us/library/cc280372.aspx for more information about filtered index.

What is the difference between varchar & nvarchar?
  1. Nvarchar can store Unicode characters but varchar does not store the Unicode characters.
  2. Varchar is stored as regular 8-bit data. But NVARCHAR strings are stored in the database as 16 bits
  3. Use nvarchar when if wants to store the text in different languages.
  4. Varchar is better performance than nvarchar
Can we do unit test for SQL server functions, store procedures, triggers…etc.?
for more information.

Please keep in mind the below notes when working on database.
Notes:
  1. Use necessary data type needed for each column. For example, if you are not going to store the different language text for that column then you no need nvarchar. In this case, use varchar.
  2. Use normalization
  3. Write unit test for your store procedures, triggers, functions and evaluate the execution time for each and every query.
  4. Use SQL Server performance tuning dashboard
  5. Use http://www.opensourcetesting.org/performance.php for health check and performance issues.
  6. Use SQL profiler for trouble shooting the performance issues.
  7. Use SQL Server Best Practices Analyser tool for checking installation, configurations…etc.
  8. Use Microsoft BI tools such as SSIS, SSRS, and SSAS if needed and use them appropriately.
  9. Please prepare the checklist for your DB development and activities and use them whenever you deploy your DB package on production.
  10. Please refer

I hope you enjoy this post. I am thinking of writing another blog about Microsoft BI tools such as SSIS, SSRS, and SSAS if I get the time.

No comments:

Post a Comment