You are on page 1of 12

Ques1. A fact table is the central table in a star schema of a data warehouse.

A fact table stores quantitative information for analysis and is often denormalized. A fact table works with dimension tables. A fact table holds the data to be analyzed, and a dimension table stores data about the ways in which the data in the fact table can be analyzed. Thus, the fact table consists of two types of columns. The foreign keys column allows joins with dimension tables, and the measures columns contain the data that is being analyzed. Suppose that a company sells products to customers. Every sale is a fact that happens, and the fact table is used to record these facts. For example:

Time ID 4 8 8

Product ID 17 21 4

Customer ID 2 3 1

Unit Sold 1 2 1

Now we can add a dimension table about customers: Customer ID Name Gender Income Education Region

1 Brian Edge M 2 3 4 2 Fred Smith M 3 5 1 3 Sally Jones F 1 7 3 In this example, the customer ID column in the fact table is the foreign key that joins with the dimension table. By following the links, you can see that row 2 of the fact table records the fact that customer 3, Sally Jones, bought two items on day 8. The company would also have a product table and a time table to determine what Sally bought and exactly when. When building fact tables, there are physical and data limits. The ultimate size of the object as well as access paths should be considered. Adding indexes can help with both. However, from a logical design perspective, there should be no restrictions. Tables should be built based on current and future requirements, ensuring that there is as much flexibility as possible built into the design to allow for future enhancements without having to rebuild the data. OLAP stands for OnLine Analytical Processing and is a technology used to collect, manage and process multidimensional data and provide fast access to this data for analytic purposes.

OLAP is widely used in business reporting for marketing, sales, human resource management and various other business fields. OLAP allows for rapid execution of complex database queries in real time. OLAP facilitates complex data views through data pivoting, complex data computations, and data modeling. OLAP deals with dimensional data, which allows for much faster execution of complex database queries compared to relational database management systems. OLAP gets a snapshot of a relational database management system data and then re-composes this data into multidimensional data. The data structure that OLAP create from the relational data is called OLAP cube. OLAP cubes can be thought of as multi dimensional array. A business might want to analyze its sales data by product, by product category, by sales manager, or something else. These different analyzing criterions are the OLAP cube dimensions. The OLAP cube structure consists of one central table called facts table, surrounded by dimensional tables - one table for each dimension of the cube. The facts table holds facts and metrics about the business process. The facts table also links to each of the dimensional tables. Each dimensional table consists of columns called dimensional attributes. If you use the example given above, the product dimensional table will have several columns (dimensional attributes) like retail price, wholesale price, weight, height, width, depth, etc. The dimension tables store data associated with particular dimension, but dont store fa cts. All facts are stored in the OLAP facts tables. In data warehousing, a dimension table is one of the set of companion tables to a fact table. The fact table contains business facts (or measures), and foreign keys which refer to candidate keys (normally primary keys) in the dimension tables. Contrary to fact tables, dimension tables contain descriptive attributes (or fields) that are typically textual fields (or discrete numbers that behave like text). These attributes are designed to serve two critical purposes: query constraining and/or filtering, and query result set labeling. Dimension attributes should be:

Verbose (labels consisting of full words) Descriptive Complete (having no missing values)

Discretely valued (having only one value per dimension table row) Quality assured (having no misspellings or impossible values)

Dimension table rows are uniquely identified by a single key field. It is recommended that the key field be a simple integer because a key value is meaningless, used only for joining fields between the fact and dimension tables. Ques 2. SQL developers on every platform are struggling, seemingly stuck in a DO WHILE loop that makes them repeat the same mistakes again and again. That's because the database field is still relatively immature. Sure, vendors are making some strides, but they continue to grapple with the bigger issues. Concurrency, resource management, space management, and speed still plague SQL developers whether they're coding on SQL Server, Oracle, DB2, Sybase, MySQL, or any other relational platform. There are some good principles you can follow that should yield results in one
combination or another. I've encapsulated them in a list of SQL dos and don'ts that often get overlooked or are hard to spot. These techniques should give you a little more insight into the minds of your DBAs, as well as the ability to start thinking of processes in a production-oriented way. 1. Don't use UPDATE instead of CASE This issue is very common, and though it's not hard to spot, many developers often overlook it because using UPDATE has a natural flow that seems logical. Take this scenario, for instance: You're inserting data into a temp table and need it to display a certain value if another value exists. Maybe you're pulling from the Customer table and you want anyone with more than $100,000 in orders to be labeled as "Preferred." Thus, you insert the data into the table and run an UPDATE statement to set the CustomerRank column to "Preferred" for anyone who has more than $100,000 in orders. The problem is that the UPDATE statement is logged, which means it has to write twice for every single write to the table. The way around this, of course, is to use an inline CASE statement in the SQL query itself. This tests every row for the order amount condition and sets the "Preferred" label before it's written to the table. The performance increase can be staggering. 2. Don't blindly reuse code This issue is also very common. It's very easy to copy someone else's code because you know it pulls the data you need. The problem is that quite often it pulls much more data than you need, and developers rarely bother trimming it down, so they end up with a huge superset of data. This usually comes in the form of an extra outer join or an extra condition in the WHERE clause. You can get huge performance gains if you trim reused code to your exact needs.

3. Do pull only the number of columns you need This issue is similar to issue No. 2, but it's specific to columns. It's all too easy to code all your queries with SELECT

* instead of listing the columns individually. The problem again is that it pulls more data than you need. I've seen this
error dozens and dozens of times. A developer does a SELECT * query against a table with 120 columns and millions of rows, but winds up using only three to five of them. At that point, you're processing so much more data than you need it's a wonder the query returns at all. You're not only processing more data than you need, but you're also taking resources away from other processes. 4. Don't double-dip Here's another one I've seen more times than I should have: A stored procedure is written to pull data from a table with hundreds of millions of rows. The developer needs customers who live in California and have incomes of more than $40,000. So he queries for customers that live in California and puts the results into a temp table; then he queries for customers with incomes above $40,000 and puts those results into another temp table. Finally, he joins both tables to get the final product. Are you kidding me? This should be done in a single query; instead, you're double-dipping a superlarge table. Don't be a moron: Query large tables only once whenever possible -- you'll find how much better your procedures perform. A slightly different scenario is when a subset of a large table is needed by several steps in a process, which causes the large table to be queried each time. Avoid this by querying for the subset and persisting it elsewhere, then pointing the subsequent steps to your smaller data set. 5. Do know when to use temp tables This issue is a bit harder to get a handle on, but it can yield impressive gains. You can use temp tables in a number of situations, such as keeping you from double-dipping into large tables. You can also use them to greatly decrease the processing power required to join large tables. If you must join a table to a large table and there's a condition on that large table, you can improve performance by pulling out the subset of data you need from the large table into a temp table and joining with that instead. This is also helpful (again) if you have several queries in the procedure that have to make similar joins to the same table.

6. Do pre-stage data This is one of my favorite topics because it's an old technique that's often overlooked. If you have a report or a

procedure (or better yet, a set of them) that will do similar joins to large tables, it can be a benefit for you to pre-stage the data by joining the tables ahead of time and persisting them into a table. Now the reports can run against that prestaged table and avoid the large join. You're not always able to use this technique, but when you can, you'll find it is an excellent way to save server resources. Note that many developers get around this join problem by concentrating on the query itself and creating a view-only around the join so that they don't have to type the join conditions again and again. But the problem with this approach is that the query still runs for every report that needs it. By pre-staging the data, you run the join just once (say, 10 minutes before the reports) and everyone else avoids the big join. I can't tell you how much I love this technique; in most environments, there are popular tables that get joined all the time, so there's no reason why they can't be prestaged. 7. Do delete and update in batches Here's another easy technique that gets overlooked a lot. Deleting or updating large amounts of data from huge tables can be a nightmare if you don't do it right. The problem is that both of these statements run as a single transaction, and if you need to kill them or if something happens to the system while they're working, the system has to roll back the entire transaction. This can take a very long time. These operations can also block other transactions for their duration, essentially bottlenecking the system. The solution is to do deletes or updates in smaller batches. This solves your problem in a couple ways. First, if the transaction gets killed for whatever reason, it only has a small number of rows to roll back, so the database returns online much quicker. Second, while the smaller batches are committing to disk, others can sneak in and do some work, so concurrency is greatly enhanced. Along these lines, many developers have it stuck in their heads that these delete and update operations must be completed the same day. That's not always true, especially if you're archiving. You can stretch that operation out as long as you need to, and the smaller batches help accomplish that. If you can take longer to do these intensive operations, spend the extra time and don't bring your system down. Enjoy faster SQL Follow these dos and don'ts whenever you can when writing queries or processes to improve your SQL performance, but remember to evaluate each situation individually to see which method works best -- there are no ironclad solutions. You'll also find that many of these tips will increase your concurrency and generally keep things moving

more smoothly. And note that while the physical implementation of these tips will change from one vendor to the next, the concepts and issues that they address exist in every SQL platform.

QUES 3. Difference between TRUNCATE and DELETE commands Submitted by Dipal havsar (not verified) on Tue, 2006-09-19 07:39. 1>TRUNCATE is a DDL command whereas DELETE is a DML command. 2>TRUNCATE is much faster than DELETE. Reason:When you type DELETE.all the data get copied into the Rollback Tablespace first.then delete operation get performed.Thatswhy when you type ROLLBACK after deleting a table ,you can get back the data(The system get it for you from the Rollback Tablespace).All this process take time.But when you type TRUNCATE,it removes data directly without copying it into the Rollback Tablespace.Thatswhy TRUNCATE is faster.Once you Truncate you cann't get back the data. 3>You cann't rollback in TRUNCATE but in DELETE you can rollback.TRUNCATE removes the record permanently. 4>In case of TRUNCATE ,Trigger doesn't get fired.But in DML commands like DELETE .Trigger get fired. 5>You cann't use conditions(WHERE clause) in TRUNCATE.But in DELETE you can write conditions using WHERE clause QUES4. In SQL Server, each column, local variable, expression, and parameter has a related data type.
A data type is an attribute that specifies the type of data that the object can hold: integer data, character data, monetary data, date and time data, binary strings, and so on.

SQL Server supplies a set of system data types that define all the types of data that can be used with SQL Server. You can also define your own data types in Transact-SQL or the Microsoft .NET Framework. Alias data types are based on the system-supplied data types. For more information about alias data types, see CREATE TYPE (Transact-SQL). User-defined types obtain their characteristics from the methods and operators of a class that you create by using one of the programming languages support by the .NET Framework. When two expressions that have different data types, collations, precision, scale, or length are combined by an operator, the characteristics of result are determined by the following:

The data type of the result is determined by applying the rules of data type precedence to the data types of the input expressions. For more information, see Data Type Precedence (Transact-SQL). The collation of the result is determined by the rules of collation precedence when the result data type is char, varchar, text, nchar, nvarchar, or ntext. For more information, see Collation Precedence (Transact-SQL). The precision, scale, and length of the result depend on the precision, scale, and length of the input expressions. For more information, see Precision, Scale, and Length (Transact-SQL).

SQL Server provides data type synonyms for ISO compatibility. For more information, see Data Type Synonyms (Transact-SQL). Data Type Categories Data types in SQL Server are organized into the following categories: Exact numerics Unicode character strings

Approximate numerics

Binary strings

Date and time

Other data types

Character strings In SQL Server, based on their storage characteristics, some data types are designated as belonging to the following groups:

Large value data types: varchar(max), nvarchar(max), and varbinary(max) Large object data types: text, ntext, image, varchar(max), nvarchar(max), varbinary(max), and xml

Exact Numerics

bigint

numeric

bit

smallint

decimal

smallmoney

int

tinyint

money Approximate Numerics float Date and Time date datetimeoffset real

datetime2

smalldatetime

datetime Character Strings char

time

varchar

text Unicode Character Strings

nchar

nvarchar

ntext Binary Strings binary varbinary

image Other Data Types cursor timestamp

hierarchyid

uniqueidentifier

sql_variant

xml

table

Spatial Types

QUES 5. differences between clustered and non-clustered indexes


Heres a summary of the differences:

A clustered index determines the order in which the rows of the table will be stored on disk and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.

Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.

A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.

A table can have multiple non-clustered indexes. But, a table can have only one clustered index. Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes dont need to store a pointer to the actu al row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index and the clustered index actually stores the row-level data in its leaf nodes.

QUES 6 . SQL constraints are used to specify rules for the data in a table.
If there is any violation between the constraint and the data action, the action is aborted by the constraint. Constraints can be specified when the table is created (inside the CREATE TABLE statement) or after the table is created (inside the ALTER TABLE statement). In SQL, we have the following constraints: NOT NULL - Indicates that a column cannot store NULL value UNIQUE - Ensures that each row for a column must have a unique value PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Ensures that a column (or combination of two or more columns) have an unique identity which helps to find a particular record in a table more easily and quickly FOREIGN KEY - Ensure the referential integrity of the data in one table to match values in another table CHECK - Ensures that the value in a column meets a specific condition DEFAULT - Specifies a default value when specified none for this column

SQL NOT NULL Constraint


The NOT NULL constraint enforces a column to NOT accept NULL values. The NOT NULL constraint enforces a field to always contain a value. This means that you cannot insert a new record, or update a record without adding a value to this field.

SQL UNIQUE Constraint


The UNIQUE constraint uniquely identifies each record in a database table. The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns. A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.

Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.

SQL PRIMARY KEY Constraint


The PRIMARY KEY constraint uniquely identifies each record in a database table. Primary keys must contain unique values. A primary key column cannot contain NULL values. Each table should have a primary key, and each table can have only ONE primary key.

SQL FOREIGN KEY Constraint


A FOREIGN KEY in one table points to a PRIMARY KEY in another table. The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables. The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key column, because it has to be one of the values contained in the table it points to.

SQL CHECK Constraint


The CHECK constraint is used to limit the value range that can be placed in a column. If you define a CHECK constraint on a single column it allows only certain values for this column. If you define a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row.

SQL DEFAULT Constraint


The DEFAULT constraint is used to insert a default value into a column. The default value will be added to all new records, if no other value is specified.

QUES7. SQL JOIN


An SQL JOIN clause is used to combine rows from two or more tables, based on a common field between them. The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN return all rows from multiple tables where the join condition is met.

Different SQL JOINs


Before we continue with examples, we will list the types the different SQL JOINs you can use: INNER JOIN: Returns all rows when there is at least one match in BOTH tables LEFT JOIN: Return all rows from the left table, and the matched rows from the right table RIGHT JOIN: Return all rows from the right table, and the matched rows from the left table FULL JOIN: Return all rows when there is a match in ONE of the tables

You might also like