I'm already at the point where I generally don't trust any SysV-Init scripts written by Novell. They are mostly broken and not properly tested.

The latest problem with the Groupwise 8.0.2 SysV-Init script /etc/init.d/grpwise arose during a restructuring of our POA cluster. We decided to move the SOAP handling for our new DataSync server to a separate POA because of the instability of the SOAP threads. the repeatedly stalled while in use by the DataSync server. A separated POA would at least spare us the trouble of kicking regular users out of their clients during the inevitable restart of the POa process in the cluster.

During our first test I came across the problem that /etc/init.d/grpwise relys on the agent processes (POA in our case) to write their PID to a file. The POA does so by writing it to a file /var/run/novell/groupwise/<postoffice>.<domain>.pid.

This works for only one POA per postoffice but fails heavily for two or more because the last started POA process overwrites the PID file of the other POA. Or other way around, the first POA in a postoffice to shut down deletes the PID-File, regardless if it contains it's own PID or not, leaving all other POAs in a state in which the cluster software is unable to shut them down properly. Summed up, a complete mess. Who the heck writes and tests such stuff over at Novell?

So I delegated the PID file creation to the startproc utility and renamed all POa instances in /etc/opt/novell/groupwise/gwha.conf to no longer use <postoffice>.<domain> as their identifier. This is how our gwha.conf looks for our two POAs, the main one handling all client connections and the one handling all the SOAP stuff:

[main.pers.dom_pers] server    = /opt/novell/groupwise/agents/bin/gwpoa command   = /etc/init.d/grpwise startup   = /media/nss/VMPERS/postoffice/pers.poa delay     = 2 wait      = 10 [soap.pers.dom_pers] server    = /opt/novell/groupwise/agents/bin/gwpoa command   = /etc/init.d/grpwise startup   = /media/nss/VMPERS/postoffice/pers-soap.poa delay     = 2 wait      = 10

I also patched /etc/init.d/grpwise, the patch can be downloaded and applied after the gwha.conf has been modified.

After starting the agents with the patched grpwise SysV-Init script the PID will be stored in a file named /var/run/novell/groupwise/<gwha-identifier>.pid. This prevents collision with the now useless PID files created by the agents them self.

After modifying our cluster configuration to use the new identifiers for resource start and stop everything works fine. At least until we don't deploy another service that utilizes a SysV-Init script written by Novell ... grml