SAS 9.4® ActiveMQ JMS Broker services randomly fail and generate errors


The SAS 9.4 ActiveMQ JMS Broker services randomly fail. When this happens, you receive messages in the broker-services log file (activemq.log) showing that the . . ./kahadb/lock directory is locked, as shown below.

2014-07-15 14:13:59,005 | INFO  | Database /softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: File '/softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock' could not be locked. | org.apache.activemq.store.SharedFileLocker | main 
2014-07-15 14:14:09,006 | INFO  | Database /softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: File '/softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock' could not be locked. | org.apache.activemq.store.SharedFileLocker | main 
2014-07-15 14:14:19,007 | INFO  | Database /softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: File '/softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock' could not be locked. | org.apache.activemq.store.SharedFileLocker | main 
2014-07-15 14:14:29,007 | INFO  | Database /softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: File '/softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock' could not be locked. | org.apache.activemq.store.SharedFileLocker | main 
2014-07-15 14:14:39,008 | INFO  | Database /softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: File '/softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock' could not be locked. | org.apache.activemq.store.SharedFileLocker | main 
2014-07-15 14:14:49,009 | INFO  | Database /softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: File '/softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock' could not be locked. | org.apache.activemq.store.SharedFileLocker | main 
2014-07-15 14:14:59,009 | INFO  | Database /softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: File '/softwares/sas/config/Lev1/Web/activemq/data/kahadb/lock' could not be locked. | org.apache.activemq.store.SharedFileLocker | main 
2014-07-15 17:44:59,650 | WARN  | Transport Connection to: tcp://203.211.129.192:54085 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///203.211.129.192:54085@61616 
2014-07-15 17:44:59,650 | WARN  | Transport Connection to: tcp://203.211.129.192:54106 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///203.211.129.192:54106@61616 
2014-07-15 17:44:59,650 | WARN  | Transport Connection to: tcp://203.211.129.192:55533 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///203.211.129.192:55533@61616 
2014-07-15 17:45:01,527 | WARN  | Transport Connection to: tcp://203.211.129.192:53689 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///203.211.129.192:53689@61616 
2014-07-15 17:45:01,528 | WARN  | Transport Connection to: tcp://203.211.129.192:53720 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///203.211.129.192:53720@61616 

In addition, the following transport-connection EOF exceptions are generated in the log file (server.log) for the SASServer1_1 instance of SAS 9.4® Web Application Server:   

2014-07-10 11:06:35,441 [Timer-48] WARN  [unknown] com.sas.svcs.scheduling.server.impl.ScheduleManager - SAS Distributed In-Process Scheduling exception occurred. 
com.atomikos.jms.AtomikosJMSException: Failed to grow the connection pool 
    at

com.atomikos.jms.AtomikosJMSException.throwAtomikosJMSException

(AtomikosJMSException.java:55) 
    at

com.atomikos.jms.AtomikosConnectionFactoryBean.throwAtomikosJMSException

(AtomikosConnectionFactoryBean.java:174) 
    at

com.atomikos.jms.AtomikosConnectionFactoryBean.createConnection

(AtomikosConnectionFactoryBean.java:593) 
    at

com.sas.scheduler.api.servers.ip.engine.mq.JMSClusterSupport.initJMSTopic

(JMSClusterSupport.java:240) 
    at

com.sas.scheduler.api.servers.ip.engine.mq.JMSClusterSupport.

safeCreateMapMessage

(JMSClusterSupport.java:1613) 
    at

com.sas.scheduler.api.servers.ip.engine.mq.JMSClusterSupport.

startMasterDetermination

(JMSClusterSupport.java:398) 
    at com.sas.scheduler.api.servers.ip.engine.mq.JMSClusterSupport$2.run

(JMSClusterSupport.java:544) 
    at java.util.TimerThread.mainLoop(Timer.java:555) 
    at java.util.TimerThread.run(Timer.java:505) 

To solve this problem, follow these steps on the middle-tier machine:

  1. Stop all of the SASServer instances (for example, SASServer1_1, SASServerN_1, and so on).
  2. Stop the SAS 9.4 ActiveMQ JMS Broker services.
  3. Implement the suggested changes in SAS KB0036267, "Increasing numbers of Apache ActiveMQ journal log files in certain SAS® 9.4 Enterprise BI and SAS® Visual Analytics environments fill up disk space."
  4. Restart the SAS 9.4 ActiveMQ JMS Broker services. Check the activemq.log file to make sure that the services start properly.
  5. Restart all of the SASServer instances (SASServer1_1, SASServerN_1, and so on).