---------------------------------------------------------------------------------------------------- Problem: GoldenGate on 2-node cluster crashes. Cannot start: $ agctl start goldengate ogg_chcmprd_inst CRS-2672: Attempting to start 'xag.ogg_chcmprd_inst.goldengate' on '' CRS-2674: Start of 'xag.ogg_chcmprd_inst.goldengate' on 'aoprlhcmdb3b' failed CRS-2679: Attempting to clean 'xag.ogg_chcmprd_inst.goldengate' on '' CRS-2681: Clean of 'xag.ogg_chcmprd_inst.goldengate' on 'aoprlhcmdb3b' succeeded CRS-2527: Unable to start 'xag.ogg_chcmprd_inst.goldengate' because it has a 'hard' dependency on 'ogg.vip' CRS-0245: User doesn't have enough privilege to perform the operation CRS-4000: Command Start failed, or completed with errors. Nothing obvious in /ogg//var/log/*log. In /ogg/sm/var/log/ServiceManager.log: 2025-08-15T09:11:01.944-0500 ERROR| Failed to open secure store for get. (Thread 9) 2025-08-15T09:11:01.944-0500 ERROR| Failed retrieving user session entry for '/ogg/sm/var/run'. (Thread 9) 2025-08-15T09:11:01.944-0500 WARN | Failed updating persisted user data during Authorization Cookie creation. (Thread 9) 2025-08-15T09:11:08.958-0500 ERROR| Failed to acquire lock owned by pid 0 for UserRoleManager after 12 retries in 6992 ms. (Thread 11) 2025-08-15T09:11:08.958-0500 ERROR| Failed to open secure store for userName. (Thread 11) Solution: ServiceManager.pid and session.dat exist in /ogg/sm/var/run even when GG is down. Rename (or delete) them and re-start GG. Thought process: None of the messages in ServiceManage.log shown above are found by Google or on the Oracle Support website. We'll do troubleshooting on our own. The agctl command takes much longer than usual (and eventually fails). During the time, according to the output of `ps ...', the command ultimately runs crsctl.bin start resource xag.ogg_chcmprd_inst.goldengate -f `strace' or `lsof' on this process doesn't reveal anything interesting. Since it's a command to control CRS (GI), let's check GI alert.log, which informs that details are in crsd_scriptagent_oracle.trc, which has these lines: [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] Executing action script: /u01/app/grid/bin/aggoldengatescaas[start] [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] GG agent running command 'start' on xag.ogg_chcmprd_inst.goldengate [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] Starting OGG SCA instance [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] ServiceManager pid file exists /ogg/sm/var/run/ServiceManager.pid [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] Checking if SM with PID# 2375135 is running [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] ServiceManager not found, proceeding with start [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] ServiceManager fork pid = 2382988 [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] Waiting for /ogg/sm/var/run/ServiceManager.pid [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] ServiceManager PID = 2382991 [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] execute XAGTask HealthCheck [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] XAGTask retcode = 3 [xag.ogg_chcmprd_inst.goldengate]{1:13971:22272} [start] XAG HealthCheck after start returned 3 The message "ServiceManager pid file exists" sounds unusual. Checking another, totally unrealted server where GG is running fine. This message does not exist in crsd_scriptagent_oracle.trc there. Tentatively, rename ServiceManager.pid and session.dat in /ogg/sm/var/run. Try starting GG again with `agctl start ...'. It works! So, apparently, when GG last crashed, it didn't clean up the Service Manager pid file and session data file. We have to manually clean them. We didn't find the root cause of the GG crash early in the morning. crsd.log has messages about starting GG, not stopping it. But we know it had been running before the crash. ---------------------------------------------------------------------------------------------------- Problem: "ERROR OGG-08502 Oracle GoldenGate Receiver Service for Oracle: Path fscprdPath not found." in recvsrvr.log and ggserr.log Solution? No solution, in spite of "SR 3-40133679521 : Unknown exception caught at outer HttpServer context". The path does exist: OGG (http://:9100 ma_epmp as oraepmp_cdb@CEPMP/CDB$ROOT) 4> info recvpath all fscprdPath_aoprlfscdb_04080 running OGG (http://:9100 ma_epmp as oraepmp_cdb@CEPMP/CDB$ROOT) 5> info recvpath fscprdPath_aoprlfscdb_04080 Path Name: fscprdPath Status: running Source URI: trail://:9102/services/v2/sources?trail=et Target URI: ogg://:9103/services/v2/targets?trail=ft Interestingly, the above command cannot be followed by "detail", even though documentation (https://docs.oracle.com/en/middleware/goldengate/core/23/gclir/info-recvpath.html) says it can: OGG (http://:9100 ma_epmp as oraepmp_cdb@CEPMP/CDB$ROOT) 6> info recvpath fscprdPath_aoprlfscdb_04080 detail 2025-08-15T19:28:45Z ERROR OGG-10386 Info request on path fscprdPath_aoprlfscdb_04080 is not supported. Another thing weird is that the path cannot be the short string (as shown in the output of "info recvpath fscprdPath_aoprlfscdb_04080"): OGG (http://:9100 ma_epmp as oraepmp_cdb@CEPMP/CDB$ROOT) 7> info recvpath fscprdPath 2025-08-15T19:29:10Z ERROR OGG-08502 Path fscprdPath not found If you shorten it, you'll get OGG-8502, which is the error in recvsrvr.log and ggserr.log. Fortunately, GG runs fine. The error appears to be doing no harm. ---------------------------------------------------------------------------------------------------- Problem: Unit of MAX_SGA_SIZE for "TRANLOGOPTIONS INTEGRATEDPARAMS" Solution: According to https://support.oracle.com/epmos/main/downloadattachmentprocessor?attachid=1456176.1%3A105&action=inline the unit of max_sga_size is MB. It's not in documentation. Note: The default max SGA for GG is 1 GB, i.e. 1024 MB. If you increase it, make sure you have enough streams pool.