You are on page 1of 41

HND202 Using NSD: A Practical Guide

Rob Gearhart Domino Quality & Serviceability Engineer Elliott Harden Field Support Engineer

Agenda
What is NSD? NSD Major Sections
Call Stacks Memcheck

NSD Checklist Case Studies

Agenda
What is NSD? NSD Major Sections
Call Stacks Memcheck

NSD Checklist Case Studies

What is NSD?
NSD (Notes System Diagnostic) is one of the primary FFDC diagnostics used for Lotus Domino Products:
Domino/Notes Quickplace/ DomDoc/ Domino Workflow Sametime

FFDC = First Failure Data Capture Used for troubleshooting Notes/Domino and Companion Products
Crashes Hangs Severe Performance Problems (not a good tool for mild or moderate performance)

NSD applies equally well for Notes Client vs. Domino Server

What is NSD?
Used on all Platform's (except Mac)
ND6 - iSeries NSD is a different animal (output wise) ND7 - iSeries NSD matches other platforms more closely In 6.5.4, NSD on iSeries includes Memcheck (must be configured via environment variable)

On Unix NSD is a shell script (nsd.sh)


Memcheck is a separate compiled binary

On W32 NSD is a compiled binary (nsd.exe)


Memcheck is built into nsd.exe

NSD Update Strategy


NSD is one of the primary focuses for Serviceability has undergone many continual improvements IBM has implemented the NSD Update Strategy to periodically compile the newest improvements for NSD for the most recent existing versions of Domino (ND6 & ND7)
Supported by the addition of versioning information for NSD Special Hotfix installer for NSD (does not conflict with regular hotfix installer) Periodically re-sync NSD source from ND8 back into MRs for ND6 & ND7

Current NSD Build is 2382 equivalent to ND7.0.2 version back ported for numerous versions of Notes/Domino.

NSD Update Strategy


NSD Updates available to customers through 2 methods:
Contact Support or PSM to get hotfix installer OR Download from the website

For more information, see


Technote # 1233676 NSD Fix List and NSD Update Strategy (Fixlist here) http://www.ibm.com/support/docview.wss?uid=swg21233676 Technote # 4013182 Updated NSD For Domino Releases (Downloads here) http://www.ibm.com/support/docview.wss?uid=swg24013182

Agenda
What is NSD? NSD Major Sections
Call Stacks Memcheck

NSD Checklist Case Studies

NSD Major Sections


Process Information (Call Stacks)
tells Support code path involved in the problem

Memcheck (Domino Memory Objects)


tells Support resources (databases, files, views, users) involved in the problem

System Information
tells Support OS configuration (Patches, etc) NOT IN THIS LAB

Environment Info
tells Support execution environment (Notes INI, etc) NOT IN THIS LAB

NSD - Call Stacks


Dump of thread stacks for all Domino processes
including external applications that Call into Notes API

Provides insight to the code path where a crash or hang occurs

NSD - Call Stacks


W32 - for fatal thread, will make 3 passes
1). Dumps complete call stack (divided into "before" and "after" frames) 2). Granular break down of stack frames, showing arguments, return address, basic register information 3). Function parameters that are pointers are de-referenced

UNIX Provides one pass for call stack


no break down of stack frames register information for limited platforms (AIX, Linux & OS390) On AIX - arguments may show as "???", meaning code not compiled with debug levels

W32 Call Stacks


NSD takes three passes of the fatal call stack
Pass ONE dumps stack trace summary, but no frame info Pass TWO dumps contents of stack frames (along with ascii equivalent) Pass THREE de-references pointer parameters, meaning we can see the contents of pointer arguments passed to a function

W32 Call Stacks - Pass ONE


Pass ONE - there are two halves of the call stack
First half of call stack is everything that happened AFTER the fatal (i.e. this is what the thread did to handle the exception). Often times will see JVM_FindSignal near the lower portion. Ignore this, it is nothing.
############################################################ ### thread 5/21: [ nIMAP:07b4:06cc]

### FP=07e4e208, PC=77f83786, SP=07e4e1e4, stkbase=07d50000, stksize=262144 ############################################################ [ 1] 0x77f83786 ntdll.ZwWaitForSingleObject+11 (560,36ee80,0,601a7c06) [ 2] 0x77e87837 KERNEL32.WaitForSingleObject+15 (7e4e5a0,77e8ae88,7e4ec0c,0) @[ 3] 0x601a7046 nnotes._OSFaultCleanup@12+342 (0,0,0,7e4ec0c) @[ 4] 0x601b07b1 nnotes._OSNTUnhandledExceptionFilter@4+145 (7e4ec0c,7e4ec0c,6ef1ab5,7e4ec0c) [ 5] 0x1000e596 jvm._JVM_FindSignal@4+180 (7e4ec0c,77ea18a5,7e4ec14,0) [ 6] 0x77ea8e90 KERNEL32.CloseProfileUserMapping+161 (0,0,0,0)

W32 Call Stacks - Pass ONE (cont)


Second half is what you are interested in (real meat of crash). Look for module names, and function names Look for FATAL thread
Fatal, panic, halt, access violation Should pair this with console output (e.g. PANIC message)

W32 - NSD demangles C++ functions in the call stack, meaning it provides the class name and function name (in that order)

UNIX Call Stacks


On Unix, there is only one pass (no dump of stack frame contents) Upper portion of the call stack is the part of the stack that deals with the fatal condition Look at portion of stack below the fatal, raise.raise, signal handler, abort, or terminate line
Which one this shows under depends on platform and nature of fatal

On Unix, C++ function names are mangled (except zSeries)

Agenda
What is NSD? NSD Major Sections
Call Stacks Memcheck

NSD Checklist Case Studies

NSD - Memcheck
Analyzes Domino Objects Steps through shared and private pools allocated by Domino Memory Manager Summarizes Memory Usage Dumps information about Open Databases, Views and Documents, and Open Files (in a nutshell) Memory Usage does NOT include externally allocated memory, such as LotusScript, Java, or third-party code
Will need OS diagnostics to determine the total memory usage

NSD - Memcheck
Memcheck can be thought of as 3 major sections
Shared Memory Private Memory Resource Usage Summary

Shared Memory Includes


Summary of Shared Pools
KEYWORD "Shared Memory" - Total Shared Memory Usage should be around 1.1 GB KEYWORD "Top 10" - largest block type should be UBM (0x82cd) at 750 MB

OS Package Info
ND6 KEYWORD "Shared OS Field ND7 KEYWORD MM/OS Structure Information Indicates thread ID of crashing thread and PANIC Message (if any)

NSF Package Info


KEYWORD "Open Databases" (lists db name, db handle) KEYWORD "Open Documents" (lists noteID's and database handles)

NIF Package Info


KEYWORD NIF Collections" (lists open views) KEYWORD NIF Collection USers" (lists [thread] users of those views)

Shared Memory Pool Summary


<@@ ---- Notes Memory Analyzer (memcheck) -> Shared Memory Stats (Time 17:45:55) ---- @@> TYPE : Count SIZE ALLOC FREE 9334512 9334512 FRAG OVERHEAD 0 0 19408 19408 %used 92% 92% %free 7% 7%

Static-DPOOL: Overall :

35 125829120 116479928 35 125829120 116479928

Note Size shows overall amount of memory allocated by Domino MM, Alloc shows whats actually in use (or sub allocated). You WANT %used to be high.

Top 10 Shared
<@@ ------ Notes Memory Analyzer (memcheck)...-> Top 10 Shared Memory Block Usage ... ------ @@> BY SIZE Type TotalSize Handles Typename

----------------------------------------------------------0x82cd 0x8252 0x834a 0x82cc 0x824b 0x8a03 0x8311 0x890b 0x8a05 0x8a01 637673472 20971520 18350080 10511340 9810466 6760070 5242880 4578420 2460000 2289210 162 20 18 161 160 1604 5 70 1 35 BLK_UBMBUFFER BLK_NSF_POOL BLK_GB_CACHE BLK_UBMBCB BLK_OPENED_NOTE BLK_NETBUFFER BLK_NIF_POOL BLK_EXECPOOL BLK_NET_SESSION_TABLE BLK_NETPOOL

-----------------------------------------------------------

MM/OS Section
<@@ ------ Notes Memory Analyzer (memcheck) -> MM/OS Structure Information (Time 13:15:45) ------ @@> Start Time = 12/13/2005 01:15:02 PM Crash Time = 12/13/2005 01:15:32 PM Error Message = PANIC: LookupHandle: handle out of range SharedDPoolSize = 4194304 FaultRecovery = 0x00010013 Cleanup Script Timeout= 300 Crash Limits = 3 crashes in 5 minutes StaticHang = [ ConfigFileSem = FDSem = nhttp: 2752: ( 10]/[ nhttp: 2752: 3500] (0xac0/0xa/0xdac) Owner=[0:0]

SEM:#0:0x010d) n=0, wcnt=-1, Users=-1,

( RWSEM:#11:0x410f) rdcnt=-1, refcnt=0 Writer=[0:0], n=11, wcnt=-1

Open Databases
<@@ ------ Notes Memory Analyzer (memcheck) -> Open Databases (Time 11:45:21) ------ @@>

D:\Lotus\DominoR65\Data\events4.nsf Version SizeLimit ReplicaID = 43.0 = 0, WarningThreshold = 0 = 86256ae0:02697903 0: 48836]

bContQueue = NSFPool [

FDGHandle = 0xf01c0098, RefCnt = 10, Dirty = N DB Sem = (FRWSEM:0x0244) state=0, nlrdrs=0 Writer=[]

SemContQueue = (RWSEM:#0:0x029d) rdcnt=-1, Writer=[] Owner=[] By: [ By: [ By: [ By: [ nevent:0e2c: nevent:0e2c: nevent:0e2c: nevent:0e2c: 2] DBH= 2] DBH= 2] DBH= 2] DBH= 3, User=CN=Sithlord/O=SET 16, User=CN=Sithlord/O=SET 18, User=CN=Sithlord/O=SET 20, User=CN=Sithlord/O=SET

Note: edited for clarity some info is missing

Open Documents
<@@ ------ Notes Memory Analyzer (memcheck) -> Open Documents (BLK_OPENED_NOTE): total=352 ...----- @@> DBH 531 NOTEID HANDLE CLASS FLAGS IsProf #Pools #Items Yes 1 4 Size Database 2984 d:\notedata\drmail\jsmith.nsf

7330 0x24ff 0x0001 0x0200 .

Open By: CN=John Smith/O=ACME/C=US Flags2 Flags3 = 0x0404 = 0x0000

OrigHDB = 531 First Item = [ Last Item = [ 9471: 836]

9471: 1228]

Non-pool size : 0 Member Pool handle=0x24ff, size=2984 .

Note Classes
Note Class Value
0x0001 0x0004 0x0008 0x0040 0x0200 0x0800

Note Type
Data Note - document Form Note View Note ACL Note Agent Note Replication Formula Note

Open Views NIF Collections


<@@ ------ Notes Memory Analyzer (memcheck) -> NIF Collections (Time 12:48:35) ------ @@> CollectionVB ViewNoteID UNID OBJID RefCnt Flags Options Corrupt Deleted Temp NS Entries ViewTitle ------------ ---------- -------- ------ ------ ------ -------- ------- ------- ---- --- ------- -----------[ 0020e005] 1518 1356a8 358710 1 0x0000 00000008 NO NO NO NO 0 MyNotices

CIDB = [

0253cc05] : 0000]

CollSem (FRWSEM:0x030b) state=0, waiters=0, refcnt=0, nlrdrs=0 Writer=[ NumCollations = 2 bCollationBlocks = [ 001e72e5] 00117005] 001a2205]

bCollation[0] = [ bCollation[1] = [ CollIndex = [ 00012a09]

Collation 0:BufferSize 26,Items 1,Flags 0 0: Ascending, by KEY, "StartDateTime", summary# 2 CollIndex = [ 00012c09]

Collation 1:BufferSize 26,Items 1,Flags 0 0: Descending, by KEY, "StartDateTime", summary# 2 ResponseIndex [ NoteIDIndex UNIDIndex [ [ 0010e4b6] 0010e385] 0010e5e7]

Open Views NIF Collection Users


<@@ ------ Notes Memory Analyzer (memcheck) -> NIF Collection Users (hash) (Time 12:48:33) ------ @@> CollUserVB -----------[ 00239805] ... CollectionVB Remote OFlags ViewNoteID Data HDB/Full View HDB/Full ... Open By ... ------------ ------ ------ ---------- ------------- ------------- ... -------------... [ 0023d005] NO 0x0082 786 1219/1874 1219/1874 ... [ nserver:09d8:04ca]

CurrentCollation = 0 [ 0013a805] ... [ 00136005] NO 0x00c2 11122 886/785 886/785 ... [ nserver:09d8:0266]

CurrentCollation = 0 [ 0028d805] ... [ 0020e005] NO 0x00c2 1518 551/1432 551/1432 ... [ nserver:09d8:03b0]

CurrentCollation = 0

Private Memory Includes


Info for each process
KEYWORD "Attach to process [procname:PID]" - to find beginning of info for each process

TLS Mapping
KEYWORD "TLS Mapping" - shows map of physical thread to virtual thread (its a Support "Thing")

Open Documents
KEYWORD "Open Documents" - lists documents opened in private memory (if any)

Private Pools Allocated through Domino Memory Manager


KEYWORD "Process Heap Memory" - total size across all private pools (should be below 100 MB with a few exceptions like server & http) KEYWORD "Top 10" - shows highest block type used

TLS Mapping
------ TLS Mapping ----NativeTID VirtualTID PrimalTID

[ nSERVER:0514:0510] [ nSERVER:0514:0002] [ nSERVER:0514:0002] [ nSERVER:0514:0504] [ nSERVER:0514:0004] [ nSERVER:0514:0004] [ nSERVER:0514:05d4] [ nSERVER:0514:0005] [ nSERVER:0514:0005] [ nSERVER:0514:0600] [ nSERVER:0514:0006] [ nSERVER:0514:0006] [ nSERVER:0514:0604] [ nSERVER:0514:0007] [ nSERVER:0514:0007] [ nSERVER:0514:0608] [ nSERVER:0514:0008] [ nSERVER:0514:0008]

Memcheck - prints out virtual thread ID in most places. We need to be able to map this to physical thread ID from the call stack. TLS Mapping section does this quite nicely!

Open Documents (Private)


<@@ ------ Notes Memory Analyzer (memcheck) -> Open Documents (BLK_OPENED_NOTE): total=352 ...------ @@> DBH 531 NOTEID HANDLE CLASS FLAGS IsProf #Pools #Items Yes 1 4 Size Database 2984 d:\notedata\drmail\jsmith.nsf

7330 0x24ff 0x0001 0x0200 .

Open By: CN=John Smith/O=ACME/C=US Flags2 Flags3 = 0x0404 = 0x0000

OrigHDB = 531 First Item = [ Last Item = [ 9471: 836]

9471: 1228]

Non-pool size : 0 Member Pool handle=0x24ff, size=2984 .

Top 10 Process Memory


<@@ ----- Notes Memory Analyzer (memcheck)...-> Top 10 [ nSERVER: BY SIZE Type TotalSize Handles Typename 09d8] Memory Block Usage... ------ @@>

----------------------------------------------------------0x4129 0x0a04 0x028b 0x0910 0x093c 0x024b 0x0221 0x0130 0x0149 0x030a 20447232 10595772 3327954 1999180 1219526 1131334 930818 562418 548834 319190 39 162 53 1126 242 19 96 1545 101 1 BLK_LOCAL BLK_NET BLK_FOLDERREPLOPS BLK_SRV_NAMES_LIST BLK_SRV_HASH_TBL BLK_OPENED_NOTE BLK_NEW_NOTE BLK_TLA BLK_PHTCHUNK BLK_LOOKUP_THREAD

-----------------------------------------------------------

Process Heap Memory


<@@ ------ Notes Memory Analyzer (memcheck) -> Process Heap Memory Stats (Time 17:46:00) ------ @@> TYPE : Count 12 2 3 12 SIZE 6291456 130808 86348 6291456 ALLOC 3795788 8994 58790 3653692 FREE 2489080 117628 24468 2631176 FRAG OVERHEAD 0 0 0 0 9486 4210 3114 16810 %used 60% 6% 68% 58% %free 39% 89% 28% 41%

Static-DPOOL: VPOOL POOL Overall : : :

Resource Usage Summary


Provides great value (one stop-shopping)
KEYWORD "Resource Usage" - easy to read summary of open resources listed by process and thread (physical/virtual)

Lists Resources in use by each thread


Open Open Open Open Databases (name and handle) Views (name and handle) Documents (noteID) Files (OS file descriptor)

Search on the Physical Thread ID in question


KEYWORD "VThread [ ] Mapped To: PTHREAD [ ]"

Resources Per Thread


** VThread [ ndiiop:0904: 14] .Mapped To: PThread [ .. .. .. .. .. ndiiop:0904: 1508] ndiiop:0904: 7]

using: Primal Thread [

SOBJ: addr=0x4cd7261c, h=0xf010404d t=f982 (PKG_NSF9+386) SOBJ: addr=0x51cd039c, h=0xf0104043 t=c275 (BLK_NSFT) SOBJ: addr=0x4d4aba64, h=0xf010404c t=c130 (BLK_TLA) Database: e:\notes\data\a_dir\archive.nsf DBH: DBH: 3401, By: CN=John Smith/OU=New York/O=ACME 3664, By: CN=John Smith/OU=New York/O=ACME

.... .... ...... .... ..

view: hCol=3666, cg=N, noteID=798, (archiveLookup)|archiveLookup DBH: 3665, By: CN=John Smith/OU=New York/O=ACME

file: fd: 2388, e:\notes\data\a_dir\software_functions.nsf

Resource Usage Summary


Allows Support to quickly isolate any potential patterns regarding database and/or documents CAVEATS
Problem may not be directly attributable to a specific database/view/document Resource Usage is not guaranteed to be a silver bullet Just because a crash occurs on a database or document does NOT mean its the database/document's fault (dont assume database corruption) Never look at an NSD in a vacuum - must know the nature of the problem first, then use NSD to fill in the gaps Must use insight from the call stacks and other key factors to know if this will determine a pattern

Agenda
What is NSD? NSD Major Sections
Call Stacks Memcheck

NSD Checklist Case Studies

NSD Checklist - Call Stacks


Find the call stack for the crashing process/thread
KEYWORDs FATAL, CHILD_DIED, HALT, PANIC What is the physical thread ID? (you will need this later) What was the crash point? (you will need Support's assistance) What modules are on the stack? (NNOTES, NLSXBE, etc) Is third-party code involved? (LSX, DSAPI, RDBMS Kernel, etc)

Symptoms
What is the flow of events? When did the crashes start? What changed since it began? How does the crash manifest itself (access violation, PANIC, etc) What do users experience? How many servers/users affect What do OS diagnostics show (CPU, disk, memory, etc) You know the drill

NSD Checklist - Memcheck


Shared Memory
Total Shared Memory Usage Top 10 Shared Block Usage Open Databases Open Documents

Private (look at appropriate Process)


Process Heap Memory Top 10 Process Block Usage TLS Mapping (if needed)

Resource Usage (find appropriate Physical Thread)


Databases/Views/Documents

Agenda
What is NSD? NSD Major Sections
Call Stacks Memcheck

NSD Hit List Case Studies

Agenda
Scenario 1 Agent Manager Crash (find agent note) Scenario 2 Domino Server Crash (find out why) Scenario 3 HTTP Crash (examine call stacks) Scenario 4 HTTP Performance Problem (look at SEMDEBUG) Scenario 5 (BONUS) Domino Server crash (find out why)

Discussion/Questions
rob_gearhart@us.ibm.com elliott_harden@us.ibm.com NSD Knowledge Collection
Technote # 7007508 Knowledge Collection: NSD for ND 6 & 7 http://www.ibm.com/support/docview.wss?uid=swg27007508

For NSD Update Strategy, see


Technote # 1233676 NSD Fix List and NSD Update Strategy (Fixlist here) http://www.ibm.com/support/docview.wss?uid=swg21233676 Technote # 4013182 Updated NSD For Domino Releases (Downloads here) http://www.ibm.com/support/docview.wss?uid=swg24013182

You might also like