Professional Documents
Culture Documents
1. Explain SDLC phases and types of SDLC SDLC phases 1. 2. 3. 4. 5. 6. Project Initiation Generally TA/BA will do this (Gathering Requirements BRTDD) Analysis Phase Cost, Time, What Software / technology to use? Design Phase --Specifications etc Development Phase goes along with design phase Testing Phase Unit Test, Smoke Test, Integration Test, Load Test, UAT Test Implementation / Production Phase: Dev Server Test Server Test Server Prod Server (USER ACCEPTANCE TEST is done here)
Most commonly known and used SDLC models: Waterfall Model The Waterfall Model is the oldest and most well-known SDLC model. The distinctive feature of the Waterfall model is its sequential step-by-step process from requirements analysis to maintenance. The major weakness of the Waterfall Model is that after project requirements are gathered in the first phase, there is no formal way to make changes to the project as requirements change or more information becomes available to the project team. What are good candidate software development projects for the Waterfall Model? Systems that have welldefined and understood requirements are a good fit for the Waterfall Model. Spiral Model In the Spiral SDLC Model, the development team starts with a small set of requirements and goes through each development phase (except Installation and Maintenance) for those set of requirements. Based on lesson learned from the initial iteration (via a risk analysis process), the development team adds functionality for additional requirements in ever-increasing "spirals" until the application is ready for the Installation and Maintenance phase (production). Each of the iterations prior to the production version is a prototype of the application. The advantage of the Spiral Model over the Waterfall Model is that the iterative approach allows development to begin even when all the system requirements are not known or understood by the development team. As each prototype is tested, user feedback is used to make sure the project is on track. The risk analysis step provides a formal method to ensure the project stays on track even if requirements do change. If new techniques or business requirements make the project unnecessary, it can be canceled before too many resources are wasted. In today's business environment, the Spiral Model (or its other iterative model cousins) is the most used SDLC model. An example application development project that would a good candidate for the Spiral Model is an online customer support system where it is not well-understood what services customers want or can accomplish online. Each iterative prototype helps answer the question, "Can and will customers use this system. The Spiral Model combines elements of the Top-down and Bottom-up SDLC models that are discussed in the next sections.
Top-Down Model The Top-down SDLC model was popularized by IBM in the 1970s, and its concepts are used in other SDLC models such as the Waterfall and Spiral Models previously discussed. In a pure Top-down model, high-level requirements are documented, and programs are built to meet these requirements. Then, the next level is designed and built. A good way to picture the Top-down model is to think of a menu-driven application. The top level menu items would be designed and coded, and then each sublevel would be added after the top level was finished. Each menu item represents a subsystem of the total application. The Top-down model is a good fit when the application is a new one and there is no existing functionality that can be incorporated into the new system. A major problem with the Top-down model is that real system functionality is not added and cannot be tested until late in the development process. If problems are not detected early in the project, they can be costly to remedy later. Bottom-Up Model In the Bottom-up SDLC model, the lowest level of functionality is designed and programmed first, and finally all the pieces are integrated together into the finished application. This means that, generally, the most complex components are developed and tested first. The idea is that any project show-stoppers will surface early in the project. The Bottom-up model also encourages the development and use of reusable software components that can be used multiple times across many software development projects. Again, think of a menu driven system where the development starts with the lowest level menu items. The disadvantage of the Bottom-up model is that an extreme amount of coordination is required to be sure that the individual software components work together correctly in the finished system. Hybrid Model The Hybrid SDLC model combines the top-down and bottom-up models. Rapid Prototyping With the demand for faster software development, and because of many well-documented failures of traditional SDLC models, Rapid Application Development (RAD) was introduced as a better way to add functionality to an application. The main new tenant of RAD compared to older SDLC models is the use of prototypes. After a quick requirements gathering phase, a prototype application is built and presented to the application users. Feedback from the user provides a loop to improve or add functionality to the application. Early RAD models did not involve the use of real data in the prototype, but new RAD implementations do use real data. The advantage of Rapid Prototyping Models is that time-to-market is greatly reduced. Rapid Prototyping skips many of the steps in traditional SDLC models in favor of fast and low-cost software development. The idea is that application software is a "throw-away." If a new version of the software is needed, it is developed from scratch using the newest RAD techniques and tools. The big disadvantage of the Rapid Prototyping Model is that the process can be too fast, and, therefore, proper testing (especially security testing) may not be done. The Rapid Prototyping Model is used for graphical user interface (GUI) applications such as web-based applications. Extreme Programming (XP) is a modern incarnation of the Rapid Prototyping Model.
Other SDLC models include: Object-Oriented Model Model Driven Development, Chaos Model, Agile Programming Model and many others.
1.
A table (relation) is in 1NF if a. b. c. There are no duplicated rows in the table. Each cell is single-valued (i.e., there are no repeating groups or arrays). Entries in a column (attribute, field) are of the same kind.
2. 3.
A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on the entire key. A table is in 3NF if it is in 2NF and if it has no transitive dependencies / A table is in 3NF if it is in 2NF and if it doesnt have any columns that are not dependent on the primary key. A table is in Boyce-Codd normal form (BCNF) if it is in 3NF and if every determinant is a candidate key. A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies. A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every join dependency in the table is a consequence of the candidate keys of the table. A table is in Domain\Key Normal Forms (DKNF) if every constraint on the table is a logical consequence of the definition of keys and domains.
4.
5. 6.
7.
Ref: http://en.wikipedia.org/wiki/Database_normalizationS
3. Explain DDL / DML / DCL / TCL commands with examples and differences
DDL: Data Definition Language statements are used to define the database structure or schema. 1. 2. 3. 4. 5. 6. CREATE - to create objects in the database ALTER - to alter the structure of the database DROP - to delete objects from the database COMMENT - to add comments to the data dictionary RENAME - to rename an object DBCC - (Database Console Commands) statements check the physical and logical consistency of a database
DML: Data Manipulation Language statements are used for managing data within schema objects. 1. 2. 3. 4. 5. 6. 7. 8. SELECT - retrieve data from the a database INSERT - insert data into a table UPDATE - updates existing data within a table DELETE - deletes all records from a table, the space for the records remain MERGE - UPSERT operation (insert or update) CALL - call a PL/SQL or Java subprogram EXPLAIN PLAN - explain access path to data LOCK TABLE - control concurrency
DCL: Data Control Language statements are used to control the security and permissions of the objects or parts of the database(s). 1. GRANT to allow specified users to perform specified tasks. 2. DENY disallow specified users from performing specified tasks. 3. REVOKE to cancel previously granted or denied permissions.
TCL: Transaction Control statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions. 1. COMMIT - save work done 2. SAVEPOINT SAVEPOINT is a point within a particular transaction to which you may rollback without rolling back the entire transaction. 3. ROLLBACK - restore database to original since the last COMMIT 4. SET TRANSACTION - Change transaction options like isolation level and what rollback segment to use Once we commit we cannot rollback. Once we rollback we cannot commit. Commit and Rollback are generally used to commit or revoke the transactions that are with regard to DML commands.
7. Explain what have you done to performance tune the SSIS packages? Data flow transformations in SSIS use memory/buffers in different ways. The way a transformation uses memory can dramatically impact the performance of your package. Transformation buffer usage can be classified into 3 categories: Non Blocking (Conditional Split / Audit / Data Conversion etc), Partially Blocking (Pivot Unpivot / Merge / Union All), and Full Blocking (Aggregate / Sort / Fuzzy Lookup). Generally speaking, if you can avoid Blocking and partially blocking transactions, your package will simply perform better. Sort is a fully blocking transformation. An easy way around needing the Sort is to sort your source data by using a SQL Command in your OLEDB Source instead of just using the drop down box and choosing Table or View. A Merge transform requires a Sort, but a Union All does not, use a Union All when you can. 8. How do you deploy the SSIS package?
We can deploy an SSIS package using Deployment Wizard or using Import Packages. Using Deployment Wizard, we have two options: 1) File System Deployment 2) SQL Server Deployment Deploy the Package 1. 2. 3. 4. 5. While in the Package designer, choose Project > [Package Name] Properties. The Configuration manager dialog will appear. Choose Deployment Utility from the tree. Change the Create Deployment Utility option from False to True. Open the Solution Explorer and right-click on the .dtsx file and choose Properties. Copy the Full Path variable and use it to find the bin\Deployment folder. Locate the [Package Name].SSIS Deployment Manifest file. Double-click on the file and follow the steps outlined by the wizard to deploy the package. Test the deployed Package 1. 2. Open MS SQL Server Management Studio and choose Connect > Integration Services from the UI. Choose the Server and connect. The packages will be saved under Stored Packages > MSDB folder. Right-click on the package to run it.
Problem Description: Perform Incremental Load using SSIS package. There is one Source table with ID (may be Primary Key), CreatedDate and ModifiedDate along with other columns. The requirement is to load the destination table with new records and update the existing records (if any updated records are available). Solution: You can use Lookup Transformation where you compare source and destination data based on some id/code and get the new and updated records, and then use Conditional Split to select the new and updated rows before loading the table. However, I don't recommend this approach, especially when destination table is very huge and volume of data is very high. You can do it in simple steps: 1. Find the Maximum ID & Last ModifiedDate from destination and store in package variables. 2. 3. Pull the new and updated records from source and load to a staging table using above variables. Insert and Update the records using Execute SQL Task
Ref: http://sql-bi-dev.blogspot.com/2010/11/incremental-load-using-ssis-package.html
http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/62063/
You could sum or average the sales by salesperson, but if you use that to compare the performance of salesmen, that might give misleading information. If the salesperson that was transferred used to work in a hot market where sales were easy, and now works in a market where sales are infrequent, her totals will look much stronger than the other salespeople in her new region, even if they are just as good. Or you could create a second salesperson record and treat the transferred person as a new sales person, but that creates problems also. Dealing with these issues involves SCD management methodologies referred to as Type 0 through 6. Type 6 SCDs are also sometimes called Hybrid SCDs. TYPE 01 SCD EXAMPLE: In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. The disadvantage of Type 1 SCD is that there is no historical record kept in the data warehouse and an advantage to Type 1 SCDs is that they are very easy to maintain.
ABC
Acme Supply Co
CA Supplier_State IL
TYPE 02 SCD EXAMPLE: The Type 2 method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made. Type 2 SCDs are not a good choice if the dimensional model is subject to change.
123
ABC
Acme Supply Co
CA
124
ABC
Acme Supply Co
IL
Another popular method for tuple versioning is to add effective date columns. Supplier_Key Supplier_Code Supplier_Name Supplier_State Start_Date End_Date
123
ABC
Acme Supply Co
CA
01-Jan-2000 21-Dec-2004
124
ABC
Acme Supply Co
IL
22-Dec-2004
The null End_Date in row two indicates the current tuple version. In some cases, a standardized surrogate high date (e.g. 9999-12-31) may be used as an end date, so that the field can be included in an index, and so that null-value substitution is not required when querying. TYPE 03 SCD EXAMPLE: The disadvantage of Type 3 SCD is it cannot track all historical changes, such as when a supplier moves twice
123
ABC
Acme Supply Co
CA
22-Dec-2004
IL
Ref: http://www.bimonkey.com/2009/07/the-slowly-changing-dimension-transformation-part-1/
http://www.cozyroc.com/ssis/table-difference http://en.wikipedia.org/wiki/Slowly_changing_dimension
To avoid those situations and to have consistent, relevant and accurate data, data cleansing is required.
Before data cleansing, Data Quality Testing need to be done. After that, data cleansing is done by parsing, data transformation, duplicate elimination and by doing some statistical analysis. The final output of the data cleansing process will be accurate, consistent and relevant data. SQL Server Integration Services (SSIS) is providing the facility to implement data cleansing processes. There are some components which can be used to perform data cleansing operations provided by SSIS. They are Lookup Fuzzy Lookup Fuzzy Grouping
Ref: http://gopika-lasitha.blogspot.com/2010/03/data-cleansing-with-ssis.html
22. What are the types of reports you have worked with?
I have created simple as well as complex reports. Reports like Table Reports and Matrix Reports, Drill down, drill through, parameterized and cascading reports. In one of my projects, I was given a task to work with parameterized reports where the user will be given a chance of selecting values from a drop down list. These entire parameters cascade on the parameter before it, more like a hierarchy. For this particular requirement I had to create a stored procedure and use it in this report.
25. Explain the steps you followed to tune reports using SSRS?
One thing I observed is too much filtering and data modification going on in the report itself. This causes reports to slow down. To overcome this, one should do as much of this in the T-SQL query (if you are using one). For example, if you are using filters on your dataset, try to put these filters in the T-SQL query (possibly in the WHERE argument section) instead of filtering within a table or a group, etc.
Validation commands Once youve seen the performance issues due to fragmentation or index problems, you normally run these commands next, since they will flush out the problems the various database objects (including the database itself) are having. DBCC CHECKDB: This is By far the most widely used command to check the status of your database. This command has two purposes: To check a database, and to correct it. Other Examples: DBCC CHECKTABLE
Maintenance commands The maintenance commands are the final steps you normally run on a database when youre optimizing the database or fixing a problem. DBCC DBREINDEX: This command rebuilds the indexes on a database. The DBCC INDEXDEFRAG command defragments the index rather than rebuilding it.
Miscellaneous commands These commands perform such tasks as enabling row-level locking or removing a dynamic-link library (DLL) from memory. DBCC HELP command simply shows you the syntax of the other commands:
31. Explain different backup and recovery models Backups Methods: Total 5
1. A full backup makes a complete backup of your database. Done every week end. 2. A differential backup stores all changes that have occurred to the database since the last full backup. Done every night. 3. A file group backup is useful when your database is so large that a full backup would take too long. 4. A transaction log backup creates a copy of all changes made to the database that are currently stored in the transaction log file. Done once in every 30 min. If you perform a simple recovery model, you will not have the option of transaction log back up. 5. Tail Log Backup May or may not retrieve data Done once in every 15 min
Recovery Models:
1. 2. 3. Full Recovery model Bulk Log Recovery model Simple Recovery model
35. Explain about SQL Client Utility The SQL Client Network Utility lets you change the way ADO connects to SQL Server and MSDE by changing the protocols used. 36. Explain Temporary Tables, Table Variable and CTE
Temp tables, behave just like normal tables, but are created in the TempDB database. They persist until dropped, or until the connection that created them disappears. They are visible in the procedure that created them. Just like normal tables, they can have primary keys, constraints and indexes, and column statistics are kept for the table. They are divided in to two: Local Temp Table (#T) and Global Temp Table (##T) Table Variables; behave very much like other variables in their scoping rules. They are created when they are declared and are dropped when they go out of scope. They cannot be explicitly dropped. Just like temp tables, table variables also reside in TempDB. Table variables can have a primary key, but indexes cannot be created on them, neither is statistics maintained on the
columns. This makes table variables less optimal for large numbers of rows, as the optimizer has no way of knowing the number of rows in the table variable. Common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.SQL Server supports two types of CTEsrecursive and nonrecursive. A nonrecursive CTE is one that does not reference itself within the CTE. A recursive CTE is one that references itself within that CTE.
37. Which one is better? Temporary Tables, Table Variable and CTE?
Depends
http://politechnosis.kataire.com/2008/06/ssis-unit-testing.html
2. 3.
At Control flow level; I will use OnError Event Handler and log the error in a custom table At Data flow Level; Based on the business requirement, If I have to redirect the error (to flat file or a table), I will redirect it If I have to ignore the failure, then I will ignore it If I have to fail the component, I will fail it ------------------------------------------------------------------------------------------------If the package fails due to network errors; then I will look the reason for failure at the Job History table ---------------------------------------------------------------------------------------------------
When u create a clustered Index, data is physically Sorted in a column Data Sorting is FAST Can Not use INCLUDE OPTION
You cannot create Indexs on columns configured with large object (LOB) data types, such as image, text, and varchar (max). Indexs allow only up to 900 Bytes of size (sum of all data types on all columns in a table). If the sum of all data types is greater than 900 Bytes, then you have to use Include Option to build an index on that table.
42. Difference between Stored Procedure and Function Stored Procedure Function
SP Can perform Error Handling SP Cannot be used as a Table Valued SP May/May not return a value SP Cannot be called from a select statement We can call function in a stored procedure Func Cannot perform Error Handling Func Can be used as a Table Valued Function should return a value Function can be called from a select statement We cannot call SP in a function. Only Extended SP can be called
View
We can update a view We can write Select * from a view You can join Two Views
Trigger
in trigger you can't pass parameter Trigger is implicitly fired when there is an insert/ update/delete on a table / view We cannot write a trigger within a stored procedure. Trigger written on an individual Table Trigger will not return a value
Unique Key
Will Allow Only one NULL Can have multiple Unique Keys on a Table It will make sure set of columns is unique
Delete
DELETE will not reset any identity columns to the default seed value Can DELETE any row that will not violate a constraint, while leaving the foreign key or any other constraint in place DELETE is a logged operation on a per row basis
DELETE is a slower operation compared to TRUNCATE You can use WHERE clause
Triggers will get fired
Specific rows of the tables. Specific columns of the tables. Specific rows and columns of the tables. Rows fetched by using joins. Statistical summary of data in a given tables. Subsets of another view or a subset of views and tables.
A subset of rows or columns of a base table. A union of two or more tables. A join of two or more tables. A statistical summary of base tables. A subset of another view, or some combination of views and base table.
A view can be created only in the current database. The name of a view must follow the rules for identifiers and must not be the same as that of the base table. A view can be created only if there is a SELECT permission on its base table. A SELECT INTO statement cannot be used in view declaration statement. A trigger or an index cannot be defined on a view. The CREATE VIEW statement cannot be combined with other SQL statements in a single batch.
SQL Server stores information on the view in the following system tables:
SYSOBJECTS stores the name of the view. SYSCOLUMNS stores the names of the columns defined in the view. SYSDEPENDS stores information on the view dependencies. SYSCOMMENTS stores the text of the view definition.
There are also certain system-stored procedures that help retrieve information on views. The sp_help system-stored procedure displays view-related information. It displays the view definition, provided the name of the view is given as its parameter. The guidelines for renaming a view are as follows:
The view must be in the current database. The new name for the view must be followed by the rules for identifiers. A view can be renamed only by its owner. A view can also be renamed by the owner of the database. A view can be renamed by using the sp_rename system stored procedure.
Columns that are based on computed values. Columns that are based on built_in_function like numeric and string functions. Columns that are based on row aggregate functions (Sum / GroupBy).
Consider a situation in which a table contains a few Columns that have been defined as NOT NULL, but the view derived from the table does not contain any of these NOT NULL columns. During an INSERT operation, there may be situation where:
All the NOT NULL columns in the base tables are defined with default values. In the first case, the INSERT operation will be successful because the default values are supplied for the NOT NULL columns. In the second case, the INSERT operation will fail because default values are not supplied for the NOT NULL columns.
Difference between view and materialized view View - store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. Materialized view - stores the results of the SQL statement in the table form in a database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.
Cursor Attributes: 1. Read Only: SQL Server will not lock the table if this attribute is used 2. Fast-Forward: If this attribute is selected, cursor always go in sequence while performing a row by row operation (Row1, row2, row3, ..) . Fetch Next Option can be used only in this attribute. 3. Scroll: If this attribute is selected, you can scroll between the rows while performing the any row operation (Row1, row30, row4, row77,.) Here are some alternatives to using a cursor: Use WHILE LOOPS Using Row Number Function and Loop over every row Using Table Variables , Using Temp Tables, Using Derived Tables Use correlated sub-queries or Use the CASE statement
57. Explain steps to migrate SS2005 to SS2008 58. What is a check point in SSIS?
SQL Server Integrated Services (SSIS) offers the ability to restart failed packages from the point of failure without having to rerun the entire package. When checkpoints are configured, the values of package variables as well as a list of tasks that have completed successfully are written to the checkpoint file as XML. When the package is restarted, this file is used to restore the state of the package to what it was when the package failed.
Star Schema: Definition: The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star with points radiating from a center. A single Fact table (center of the star) surrounded by multiple dimensional tables (the points of the star). Advantages: Simplest DW schema Easy to understand Easy to Navigate between the tables due to less number of joins. Most suitable for Query processing
Snowflake schema: Definition: A Snowflake schema is a Data warehouse Schema which consists of a single Fact table and multiple dimensional tables. These Dimensional tables are normalized. A variant of the star schema where each dimension can have its own dimensions. Advantages: These tables are easier to maintain Saves the storage space.
Starflake schema - Hybrid structure that contains a mixture of (denormalized) STAR and (normalized) SNOWFLAKE schemas.
9. If appropriate, use as low of isolation level as possible for the user connection running the transaction. 10. Consider using bound connections.
66.
ERROR_NUMBER() returns the error number. ERROR_SEVERITY() returns the severity. ERROR_STATE() returns the error state number. ERROR_PROCEDURE() returns the name of the stored procedure or trigger where the error occurred. ERROR_LINE() returns the line number inside the routine that caused the error. ERROR_MESSAGE() returns the complete text of the error message. The text includes the values supplied for any substitutable parameters, such as lengths, object names and times etc.
68. What Third Party tools have you used in your previous projects?
CozyRoc's Table Difference to replace SCD SQL PROMPT 5.0 - Red Gate Tool
69. What are the challenges have you faced in your previous projects? How did you overcome those challenges?
One of our business requirements is they wanted output in XML format. Since we do not have any XML destination component in SSIS, I have to write a stored procedure using FOR XML clause to get the XML output.
70. Explain your previous project and your roles and responsibilities in it
My recent project is with Security Health Plan, located in Marshfield, WI. This is a migration project from their legacy system to QNXT Application with back end as SQL Server Database. My role in this project is mainly as a SSIS Developer \ Interface & Extract developer. I designed SSIS packages to export data out of SQL Server using Stored Procedures. I also validated (Using BIDS & Execute Package Utility --> Validation) and deployed SSIS packages. I Prepared release notes document and also Test Case Scenarios for all the interfaces and extracts. I Optimized existing SQL queries and also fine tuned SSIS packages by eliminating Blocked transformations and replacing them with either Partially Blocking or Non-Blocking transformations. I also used Cache connection managers while performing the lookup operation which increased the processing speed.
In my result set I ended up with three different Age Groups. The first age group goes from Age 5 to Age 11, the second age group goes from 11 to 23, and the last age group is 29 to 40. The NTILE function just evenly divides your record set into the number of groups the NTILE function requests. By using the NTILE function each record in a group is give the same ranking.
If we PIVOT any table and UNPIVOT that table do we get our original table? I really think this is good question. Answer is yes, you can but not always. When we pivot the table we use aggregated functions. If due to use of this function if data is aggregated, it will be not possible to get original data back. Ref: http://blog.sqlauthority.com/2008/06/07/sql-server-pivot-and-unpivot-table-examples/