You are on page 1of 9

Page 1 \ 8

RCA Name IT Customer Complaints RCA


Report Number 2012.67
Report Date 4/2/2012
RCA Owner Problem Manager

Root Cause Analysis Report


Problem Statement

Focal Point Customer Complaints

When
Start Date 3/5/2012 End Date 3/5/2012
Start Time 8:44am End Time 2:31pm
Unique Timing While database admin was on vacation

Where
Location Company website
System Company IT infrastructure

Actual I mpact Cost


Revenue Estimated 1,500,000.00
Customer Service Negative impact
Other... Negative publicity
Actual I mpact Total: $1,500,000.00

Frequency 2 times overall

Potential I mpact
Revenue Much more
Customer Service Higher negative impact

Report and chart generated by Sologics Causelink software. www.sologic.com


Page 2 \ 8
Report Summaries

Cause and Effect Summary


On March 5, 2012 we received numerous complaints from customers about our website being down
while they were attempting to use it. The website was down from approximately 8:44am to 2:31pm
EST. The customers were unable to access the website because they were receiving "500" errors from
our web server. "500" errors prevent users from accessing the website. The server was returning "500"
errors because the application server which processes requests was timing out, and we had an
unusually high amount of web traffic. The application server was timing out it was receiving requests,
and the associated database was not working. The database was not working because the SQL
database cluster was not processing queries. The SQL cluster could not process new queries due to the
fact that the transaction log stopped growing. The log couldn't grow because the T:Drive was full and
we were using only one database cluster. There was only one database cluster in use because the
other cluster was being used for UAT testing. The drive was full because there is fixed capacity, the log
file storage grew, the logs were not truncated, and truncating the logs reduces memory needs. The
logs weren't truncated because the database administrator (DBA) manually truncates them, and was
on vacation. The backup DBA was not aware the logs needed truncating because there was no
process in place to inform the backup DBA of critical tasks.
Page 3 \ 8
Solutions
ID Label Description
1 Solution Implement process to notify backup DBA of critical tasks when taking over duties.
Cause No process in place to inform backup DBA
Note Barrier = 3 months until
Assigned Jennifer Elderberry Criteria Fail
Due Status Selected
Term Choose Cost $0.00

2 Solution Create document highlighting DBA duties in case of turnover or emergency backup DBA
appointed
Cause
Note
Assigned Jennifer Elderberry Criteria Pass
Due Status Selected
Term Choose Cost $0.00

3 Solution Explore automating log truncation


Cause Logs are manually truncated by DBA
Note
Assigned Dave Flynn Criteria Fail
Due Status Selected
Term Choose Cost $0.00

5 Solution Use seperate RD SQL clusters for UAT testing


Cause Other SQL cluster being used for UAT testing
Note
Assigned Ted Dezember Criteria Pass
Due Status Identified
Term Choose Cost $0.00

6 Solution Use multiple databases for application servers


Cause Only one database cluster in use
Page 4 \ 8
Note
Assigned Jennifer Elderberry Criteria Pass
Due Status Selected
Term Choose Cost $0.00

7 Solution Increase space on T:Drives


Cause T:Drive at zero bytes free
Note
Assigned Ted Dezember Criteria Pass
Due Status Selected
Term Choose Cost $0.00
Page 5 \ 8
Team
ID Label Description Label Description
1 First Name Cory Last Name Boisoneau
Phone (1) 425-225-5885 Phone (2)
Role Facilitator Group
Email cory.boisoneau@sologic.com

2 First Name Jennifer Last Name Elderberry


Phone (1) 206-985-4845 Phone (2)
Role Database Admin Group
Email jelder@xyz.com

3 First Name Dave Last Name Flynn


Phone (1) 206-254-8890 Phone (2)
Role Backup DBA Group
Email dflynn@xyz.com

4 First Name Ted Last Name Dezember


Phone (1) 206-795-4353 Phone (2)
Role IT Infrastructure Manager Group
Email tdezem@xyz.com

5 First Name Hannah Last Name Zweikwinden


Phone (1) 206-658-0098 Phone (2)
Role IT Analyst Group
Email hzweik@xyz.com
Page 6 \ 8
Evidence
ID Label Description
1 Evidence Log file
Cause(s) Requests made of application server
The application server was timing out
Transaction log was unable to grow
T:Drive at zero bytes free
Site was live
People visiting website
Logs were not truncated
Customers attempted to access site
Database not working

Location
Link
Contributor Jennifer Elderberry
Type Document
Quality

2 Evidence Statement from DBA


Cause(s) Application server processes requests
Only one database cluster in use
Only one application server exists
We only have two SQL clusters
SQL trans. log needs to grow to process queries
Storage required for log to grow
Storage file size fixed
Database Admin (DBA) was on vacation
Logs are manually truncated by DBA
No process in place to inform backup DBA
Transaction log located on T:Drive
Company only has one DBA

Location
Link
Contributor Jennifer Elderberry
Type Direct Statement
Quality
Page 7 \ 8
3 Evidence Statement from backup DBA
Cause(s) SQL server was not processing queries
Backup DBA not aware logs needed truncating
Application server relies on working database

Location
Link
Contributor Dave Flynn
Type Direct Statement
Quality

4 Evidence Statement from IT Infrastructure Manager


Cause(s) Other SQL cluster being used for UAT testing

Location
Link
Contributor Ted Dezember
Type Direct Statement
Quality

5 Evidence Statement from IT Analyst


Cause(s) "500" errors prevent access to website
Time outs result in "500" errors
Functional database relies on working SQL server

Location
Link
Contributor Hannah Zweikwinden
Type Direct Statement
Quality

6 Evidence Client compaint log


Cause(s) Customers not able to access our web site
Web server returned error ("500"-type)
Chose to contact web support with complaint

Location
Page 8 \ 8
Link
Contributor Jennifer Elderberry
Type Document
Quality

7 Evidence Customer statement


Cause(s) Customers need/want to access site

Location
Link
Contributor Jennifer Elderberry
Type Direct Statement
Quality
Terminated because:
Chart Type Legend People visiting Site was live
website Desired state
Transitory END

Non-transitory
Evidence Evidence
Omission - Transitory
Log file Log file
Requests made of
Omission - Non-transitory application server

Focal Point

Solution Implemented Evidence


Log file Terminated because:
Application server
processes requests Desired state
END

Evidence
Statement from DBA

Terminated because:
SQL trans. log needs
to grow to process Other causal paths more productive
queries END

Evidence
Statement from DBA

Terminated because: Terminated because:


Application server Transaction log
relies on working Desired state located on T:Drive
database END Other causal paths more productive
END
The application
server was timing
out
Evidence Evidence
Statement from backup DBA
Statement from DBA
Evidence
Log file Terminated because:
We only have two SQL
clusters Other causal paths more productive
END

SQL server was not


processing queries
Evidence
Statement from DBA
Only one database
cluster in use
Evidence
Statement from backup DBA

Evidence
Statement from DBA Terminated because:
Other SQL cluster
being used for UAT Other causal paths more productive
Solutions testing END
Use multiple databases for application
servers

Criteria Pass Status Selected Evidence


Statement from IT Infrastructure Manager
Web server returned Database not working
error ("500"-type)
Solutions
Use seperate RD SQL clusters for UAT
Transaction log was testing
unable to grow Criteria Pass Status Identified
Evidence Evidence
Client compaint log Log file

Evidence
Log file

Terminated because:
Storage required for
log to grow Other causal paths more productive
Terminated because: END
Time outs result in
"500" errors Other causal paths more productive
END
Terminated because: Evidence
Functional database
relies on working Desired state Statement from DBA
Evidence SQL server END
Statement from IT Analyst Terminated because:
Storage file size
fixed Other causal paths more productive
Evidence END
Statement from IT Analyst

Evidence
Terminated because: T:Drive damaged Statement from DBA
Customers not able Only one application
to access our web server exists ?
site END Terminated because:
Database Admin (DBA)
was on vacation Other causal paths more productive
END
Evidence Evidence
OR
Client compaint log Statement from DBA
Evidence
Statement from DBA
T:Drive at zero
bytes free

Terminated because:
"500" errors prevent Evidence
access to website Desired state
END Log file Terminated because:
Logs are manually
truncated by DBA ?
Solutions END
Evidence Increase space on T:Drives
Statement from IT Analyst Criteria Pass Status Selected
Evidence
Statement from DBA

Solutions
Logs were not Explore automating log truncation
truncated
Terminated because: Criteria Fail Status Selected
Customer Complaints Customers attempted
to access site Desired state
END
Evidence
Log file
Evidence
Log file Terminated because:
Company only has one
DBA Desired state
END

Evidence
Terminated because: Statement from DBA
Customers need/want
to access site Desired state
END

Evidence
Customer statement Terminated because:
Backup DBA not aware No process in place
logs needed to inform backup DBA Other causal paths more productive
truncating END

Evidence Evidence
Terminated because: Statement from backup DBA Statement from DBA
Chose to contact web
support with Other causal paths more productive
complaint END Solutions
Implement process to notify backup DBA of
critical tasks when taking over duties.
Evidence Criteria Fail Status Selected
Client compaint log Barrier = 3 months until

You might also like