DCD (dead connection detection) or Terminated Connection Detection as called in 12c ------------------------------------------------------------------------------------------------------------ Update 2010-12: * ASM complication, or rather, lack of Test in 10gR2 and 11gR2 shows that you should always set DCD, or even SQL*Net trace for server side tracing, in sqlnet.ora under DB ORACLE_HOME/network/admin, regardless whether you have ASM, RAC, 11g or 10g. (Ref: Doc 1136945.1) 11gR2 RAC has scan listeners. You don't need to restart or reload it; restarting (reloading) the regular listener is enough. ------------------------------------------------------------------------------------------------------------ * How to check if DCD is set up right On UNIX, the easiest way to check for DCD is to trace the shadow process for your SQL*Net connection. For example, on a Solaris Oracle server where 1 minute sqlnet.expire_time is set: $ truss -tsetitimer -vsetitimer -p 9266 Received signal #14, SIGALRM, in read() [caught] setitimer(ITIMER_REAL, 0x08044E30, 0x00000000) = 0 value: interval: 0.000000 sec value: 60.000000 sec setitimer(ITIMER_REAL, 0x08044F00, 0x00000000) = 0 value: interval: 0.000000 sec value: 60.000000 sec setitimer(ITIMER_REAL, 0x08045054, 0x00000000) = 0 value: interval: 0.000000 sec value: 60.000000 sec After you wait for at most DCD expire time, you should see the above output where the value matches the expire time. If not, DCD is not enabled, at least for your connection. On Linux the output is similar and the 60 second timer is seen even without -v option: $ strace -e trace=setitimer -p 13193 Process 13193 attached - interrupt to quit --- SIGALRM (Alarm clock) @ 0 (0) --- setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0 The documented way to check if DCD is set up right involves SQL*Net trace. Suppose server side sqlnet.ora has sqlnet.expire_time = 1 trace_level_server=16 trace_file_server=svr trace_directory_server=/some/path/on/server trace_unique_server=true # Add '_pid' to trace filename trace_timestamp_server=ON # Only in Oracle8i onwards and `lsnrctl reload' (stop and restart also works), and when the client connects to the server, server side trace file will have: [04-AUG-2007 22:44:03:441] niotns: Enabling CTO, value=60000 (milliseconds) [04-AUG-2007 22:44:03:441] niotns: Enabling dead connection detection (1 min) Note CTO (probably connection timeout) of 60 seconds matching our expire_time setting. Note:395505.1 says to check DCD, enable client trace and wait more than twice DCD timeout time doing nothing and then type any query on client side. My client side trace has [04-AUG-2007 23:07:00:942] nsprecv: tlen=20, plen=10, type=6 [04-AUG-2007 23:07:00:942] nsprecv: 10 bytes to leftover [04-AUG-2007 23:07:00:942] nsprecv: packet dump [04-AUG-2007 23:07:00:942] nsprecv: 00 0A 00 00 06 00 00 00 |........| [04-AUG-2007 23:07:00:942] nsprecv: 00 00 |.. | [04-AUG-2007 23:07:00:942] nsprecv: normal exit [04-AUG-2007 23:07:00:942] nsrdr: got NSPTDA packet [04-AUG-2007 23:07:00:942] nsrdr: NSPTDA flags: 0x0 [04-AUG-2007 23:07:00:942] nsrdr: normal exit [04-AUG-2007 23:07:00:942] nsdo: got "null" packet That 'got "null" packet' is the tell-tale sign DCD is working. (Note: Client side trace has the line "niotns: Not trying to enable dead connection detection." Ignore that. It probably means the client machine is not a DB server which has DCD enabled.) Note that enabling or disabling DCD will not affect existing connections. For instance, listener reload (or restart) after you remove "sqlnet.expire_time" from sqlnet.ora on the server, existing connections are still watched by DCD. Ref: Note:438923.1 "How To Track Dead Connection Detection(DCD) Mechanism Without Enabling Any Client/Server Network Tracing". ------------------------------------------------------------------------------------------------------------ * Shared server complication (test with 9.2.0.1 client connecting to 10.2.0.2 server) Note:191209.1 says on VMS, "For a shared server (MTS) connection, a dispatcher will need one timer for each client process connection." In fact, this is true for a dispatcher on Solaris too (and possibly generic to any UNIX and Linux). In the following, 9355 is our only dispatcher D000 (all others were shutdown by alter system, or prevented from starting by dispatchers parameter). When the first client comes in, we see $ truss -tsetitimer -vsetitimer -p 9355 setitimer(ITIMER_REAL, 0x08044A60, 0x00000000) = 0 value: interval: 0.000000 sec value: 60.000000 sec setitimer(ITIMER_REAL, 0x08046730, 0x00000000) = 0 value: interval: 0.000000 sec value: 0.000000 sec setitimer(ITIMER_REAL, 0x08045370, 0x00000000) = 0 value: interval: 0.000000 sec value: 60.000000 sec About 18 seconds later, a second client connects to the same dispatcher (verify by select spid, program from v$process where addr in (select dispatcher from v$circuit)) and we see Received signal #14, SIGALRM, in pollsys() [caught] setitimer(ITIMER_REAL, 0x08045EE0, 0x00000000) = 0 value: interval: 0.000000 sec value: 42.380000 sec setitimer(ITIMER_REAL, 0x08046034, 0x00000000) = 0 value: interval: 0.000000 sec value: 42.380000 sec From that point on, we see Received signal #14, SIGALRM, in pollsys() [caught] setitimer(ITIMER_REAL, 0x08045EE0, 0x00000000) = 0 value: interval: 0.000000 sec value: 17.610000 sec setitimer(ITIMER_REAL, 0x08046034, 0x00000000) = 0 value: interval: 0.000000 sec value: 17.610000 sec Received signal #14, SIGALRM, in pollsys() [caught] setitimer(ITIMER_REAL, 0x08045EE0, 0x00000000) = 0 value: interval: 0.000000 sec value: 42.380000 sec setitimer(ITIMER_REAL, 0x08046034, 0x00000000) = 0 value: interval: 0.000000 sec value: 42.380000 sec Received signal #14, SIGALRM, in pollsys() [caught] setitimer(ITIMER_REAL, 0x08045EE0, 0x00000000) = 0 value: interval: 0.000000 sec value: 17.610000 sec setitimer(ITIMER_REAL, 0x08046034, 0x00000000) = 0 value: interval: 0.000000 sec value: 17.610000 sec Received signal #14, SIGALRM, in pollsys() [caught] setitimer(ITIMER_REAL, 0x08045EE0, 0x00000000) = 0 ...... This indicates that the dispatcher process sets up two alarms, one for each client. Initially, the 60-second alarm reminds of itself to do DCD check in 1 minute (our expire_time=1). But since at 17.6 seconds, a second client comes in, it adjusts its alarm to 42.4 seconds to match its original schedule for the first client. Of course it does not forget the new client. When the original 1 minute alarm goes off, it sets up a 17.6 second alarm for the second client. When this 17.6 second alarm expires, it immediately reminds itself of a first client DCD check in 42.4 seconds, and so on. Another shared server complication is in Bug 4018031 "CLIENT SESSION WHICH CONNECT THROUGH MTS SERVER REMAIN EVEN THOUGH CLIENT DEATH". Workaround is set disable_oob=on in sqlnet.ora if using UNIX. ------------------------------------------------------------------------------------------------------------ * Minimum expire_time is 1 minute. If you set it to, say, 0.1, in sqlnet.ora, DCD will *not* be enabled. ------------------------------------------------------------------------------------------------------------ * TCPView (Windows utility from sysinternals.com) can Close Connection for a specific connection. DCD is not playing a role here and the trace file is the same as in absence of DCD. [04-AUG-2007 21:15:03:533] ntt2err: entry [04-AUG-2007 21:15:03:533] ntt2err: soc 15 error - operation=5, ntresnt[0]=517, ntresnt[1]=131, ntresnt[2]=0 [04-AUG-2007 21:15:03:533] ntt2err: exit [04-AUG-2007 21:15:03:533] nttrd: exit [04-AUG-2007 21:15:03:533] nsprecv: error exit [04-AUG-2007 21:15:03:533] nserror: entry [04-AUG-2007 21:15:03:533] nserror: nsres: id=0, op=68, ns=12547, ns2=12560; nt[0]=517, nt[1]=131, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0 [04-AUG-2007 21:15:03:533] nsrdr: error exit [04-AUG-2007 21:15:03:533] nsdo: nsctxrnk=0 [04-AUG-2007 21:15:03:533] nsdo: error exit [04-AUG-2007 21:15:03:533] nioqrc: wanted 1 got 0, type 0 [04-AUG-2007 21:15:03:533] nioqper: error from nioqrc [04-AUG-2007 21:15:03:533] nioqper: ns main err code: 12547 [04-AUG-2007 21:15:03:533] nioqper: ns (2) err code: 12560 [04-AUG-2007 21:15:03:533] nioqper: nt main err code: 517 [04-AUG-2007 21:15:03:533] nioqper: nt (2) err code: 131 [04-AUG-2007 21:15:03:533] nioqper: nt OS err code: 0 [04-AUG-2007 21:15:03:533] nioqer: entry [04-AUG-2007 21:15:03:533] nioqer: incoming err = 12151 [04-AUG-2007 21:15:03:533] nioqce: entry [04-AUG-2007 21:15:03:533] nioqce: exit [04-AUG-2007 21:15:03:533] nioqer: returning err = 3135 [04-AUG-2007 21:15:03:533] nioqer: exit [04-AUG-2007 21:15:03:533] nioqrc: exit [04-AUG-2007 21:15:03:534] nioqds: entry [04-AUG-2007 21:15:03:534] nioqds: disconnecting... ...... The words "ntt2err: soc 15 error - operation=5" definitively tell us the connection was already severed. ntt2err is a function called when a TNS transport layer error occurs. What Oracle calls socket (e.g. "soc 15") is not client side port number (as in `netstat -an'), but instead the client process file descriptor (file handle if on Windows), vieweable with lsof, pfiles, /proc//fd, Process Explorer, etc. Once this error is identified as the root cause, other errors can be ignored, such as 12547 (TNS:lost contact), 12560 (TNS:protocol adapter error). (Ref: http://www.itpub.net/thread-1358266-1-1.html)