Resolving SAS® Viya® 3.5 service start-up problems on Linux


This SAS KB article describes techniques for troubleshooting and resolving SAS Viya 3.5 service start-up problems on Linux. This information expands upon information from General Servers and Services: Operate (Linux) in the SAS® Viya® 3.5 Administration Guide. As noted in that documentation, the sas-viya-all-services script does not span multiple machines. In a multi-machine environment, always start the core services that are outlined here before running /etc/init.d/sas-viya-all-services.

Basic Concepts

There are two key services that must be healthy before other SAS Viya services are started. The services and the order in which they need to be started are as follows:

1. sas-viya-consul-default: This service is deployed on all machines in the SAS Viya deployment. Machines assigned to the consul host group in inventory.ini during SAS Viya deployment run Consul in server mode, while other machines run as Consul clients.

2. sas-viya-vault-default: This service is deployed on all machines assigned to the Consul host group in inventory.ini during SAS Viya deployment, so it is co-located with machines that act as Consul servers.

After those two services are started and healthy on all nodes on which they are deployed, the following services should be started. The start-up order for these services is not important, but they must be started after Consul and Vault and before the rest of the SAS Viya services.

sas-viya-sasdatasvrc-* (multiple services for SAS® Infrastructure Data Server/PostgreSQL)

sas-viya-rabbitmq-server-default

sas-viya-httpproxy-default (this starts and dynamically reloads crucial configuration for HTTPD on RHEL or apache2 on SUSE)

After all of these services are started and healthy, the remaining SAS Viya services can be started.

If there is a problem with any of the services noted above, you should stop all SAS Viya services, verify that all SAS Viya processes have been stopped, and then start and test each of the services mentioned above before proceeding to start the next service.  

Steps for Resolving Start-up Issues

Steps for resolving start-up problems are shown below.  If you experience an error with any of these steps, you should not proceed further until that problem is resolved. Review the logs noted below and contact SAS Technical Support if further assistance is needed.

The Ansible command examples below should all be executed in the sas_viya_playbook directory, so that Ansible can find the correct config and inventory information via settings in sas_viya_playbook/ansible.cfg and sas_viya_playbook/inventory.ini. If you are not familiar with Ansible, do not have access to the sas_viya_playbook directory, or encounter problems with Ansible commands, you can use the manual commands instead.  

Stop All Services and Verify That the Services Are Stopped

1. Stop all SAS Viya services by running the following command on each SAS Viya machine:

sudo /etc/init.d/sas-viya-all-services stop

Or you can run the optional Viya-ARK viya-services-stop.yml playbook.

2. Verify that all SAS Viya processes are stopped by running the following command on each SAS Viya machine:

ps -ef | grep -E 'viya|erlang'

Or you can use Ansible:

ansible all -m shell -a "ps -ef | grep -E 'viya|erlang'"

If any SAS Viya processes are returned by this command, kill them.

3. Ensure the systemctl status for all services is stopped by running the following on all SAS Viya machines:

for i in $(cd /etc/init.d && ls sas-viya*); do if [[ $i != *"all-services"* ]]; then sudo systemctl stop $i; fi; done

Or you can use Ansible:

ansible all -m shell -b -a "for i in \$(cd /etc/init.d && ls sas-viya*); do if [[ \$i != *"all-services"* ]]; then sudo systemctl stop \$i; fi; done"

This might not be necessary in all cases, but it ensures that the systemctl status for all SAS Viya services is in the stopped state and prevents problems that might occur when abnormal process termination results in a mismatch between the reported systemctl status and the true status of a service.

Start SAS Configuration Server (Consul)

4. On all Viya machines in the deployment, start Consul:

sudo systemctl start sas-viya-consul-default

Or you can use Ansible:

ansible all -b -m service -a "service=sas-viya-consul-default state=started"

5. Verify that Consul is healthy by running the following on all SAS Viya machines:

curl -sk https://localhost:8501/v1/status/leader

Or you can use Ansible:

ansible all -m shell -a "curl -sk https://localhost:8501/v1/status/leader"

The response should be one of the IP addresses for the machines in the Consul host group from inventory.ini, such as: "10.104.85.157:8300"

You can also check the newest /opt/sas/viya/config/var/log/consul/default/sas-consul_*.log file for messages such as the following:

[INFO] consul: New leader elected: server.example.com

There are often WARN and ERR messages in the Consul log at start-up that can be ignored. The key messages to look for are that all nodes find each other and a leader is elected.

Start SAS Secrets Manager (Vault)

6. On all SAS Viya machines in the Consul host group in sas_viya_playbook/inventory.ini, start Vault:

sudo systemctl start sas-viya-vault-default

If you do not have access to sas_viya_playbook/inventory.ini for reference, the command can be executed on all machines that have the file /etc/init.d/sas-viya-vault-default.

Or you can use Ansible:

ansible consul -b -m service -a "service=sas-viya-vault-default state=started"

7. Verify that Vault is healthy by running the following on all machines assigned to the consul host group in inventory.ini:

curl -sk https://127.0.0.1:8200/v1/sys/health

Or you can use Ansible:

ansible consul -m shell -a "curl -sk https://127.0.0.1:8200/v1/sys/health"

Good output looks similar to the following:

{"initialized":true,"sealed":false,"standby":false,"performance_standby":false,
"replication_performance_mode":"disabled","replication_dr_mode":"disabled",
"server_time_utc":1650915307,"version":"1.2.2","cluster_name":"vault-cluster-f68d004a","cluster_id":"42f9d9a4-e284-7718-7d45-a32a0dcfd4b9"}

You can also check the newest /opt/sas/viya/config/var/log/vault/default/sas-vault_<date>_<time>.log file (not sas-vault-audit_<date>_<time>.log). The log should end with output similar to this:

(INFO) Waiting for local Vault to unseal and pass Consul health checks.
(INFO) Vault Address: https://127.0.0.1:8200
(INFO) Enabling auditing to file: /opt/sas/viya/config/var/log/vault/default/sas-vault-audit
_2022-04-25_11-55-20.log

[INFO]  core: disabled audit backend: path=file/
[INFO]  core: enabled audit backend: path=file/ type=file
Success! Enabled the file audit device at: file/
(INFO) Running check_vault_cert_role.sh
(INFO) Updating role to match current consul values.
(INFO) Updating token default ttl to keep changes in sync.

Start SAS Infrastructure Data Server (PostgreSQL)

8. Start the template services for sasdatasvrc by running the following on all machines in the pgpoolc and sasdatasvrc host groups in inventory.ini:

for i in $(cd /etc/init.d && ls *sasdatasvrc*template*); do sudo systemctl start $i; done

If you do not have access to sas_viya_playbook/inventory.ini for reference, the command can be executed on all machines that have /etc/init.d/sas-viya-sasdatasvrc-*template* files.

Or you can use Ansible:

ansible sasdatasvrc,pgpoolc -b -m shell -a "for i in \$(cd /etc/init.d && ls *sasdatasvrc*template*); do systemctl start \$i; done"

9. Start the remaining sasdatasvrc services by running the following on the machines assigned to the sasdatasvrc host group in inventory.ini:

sudo su - sas -c "/opt/sas/viya/config/data/sasdatasvrc/postgres/node0/startall"

Or you can use Ansible:

ansible sasdatasvrc -b --become-user=sas -m shell -a "/opt/sas/viya/config/data/sasdatasvrc/postgres/node0/startall"

If you have a HA deployment of Postgres and therefore more nodes than just node0, you can start the other nodes by adjusting the command for node1, node2, and so on.

10. Verify that the sasdatasvrc services are up by running the following:

sudo /etc/init.d/sas-viya-sasdatasvrc-postgres-pgpool0 status

Or you can use Ansible:

ansible pgpoolc -b -m shell -a "/etc/init.d/sas-viya-sasdatasvrc-postgres-pgpool0 status"

The output should indicate a status of up.

If you have a HA deployment of Postgres and therefore more pgpool instances than just pgpool0, you can get the status of the other nodes by adjusting the command for pgpool1, pgpool2, and so on.

If there are any problems with starting sasdatasvrc or getting a status of "up", review the logs in /opt/sas/viya/config/var/log/sasdatasvrc/postgres.

Start SAS Message Broker (RabbitMQ)

11. On all SAS Viya machines in the RabbitMQ host group in inventory.ini, start RabbitMQ via this command:

sudo systemctl start sas-viya-rabbitmq-server-default

If you do not have access to sas_viya_playbook/inventory.ini for reference, the command can be executed on all machines that have the file /etc/init.d/sas-viya-rabbitmq-server-default.

Or you can use Ansible:

ansible rabbitmq -b -m service -a "service=sas-viya-rabbitmq-server-default state=started"

12. Verify that RabbitMQ is healthy by running the following on all machines assigned to the RabbitMQ host group in inventory.ini:

sudo /opt/sas/viya/home/sbin/rabbitmqctl cluster_status
sudo /opt/sas/viya/home/sbin/rabbitmqctl status

Or you can use Ansible:

ansible rabbitmq -b -m shell -a "/opt/sas/viya/home/sbin/rabbitmqctl cluster_status && /opt/sas/viya/home/sbin/rabbitmqctl status"

The command output should list all the expected nodes in the RabbitMQ cluster and no errors.

If there are any problems with starting RabbitMQ or getting good status output, review the logs in /opt/sas/viya/config/var/log/rabbitmq-server/default.

Start HTTP Server (via sas-viya-httpproxy-default)

13. On all SAS Viya machines in the HTTPproxy host group in inventory.ini, start the HTTPproxy via this command

sudo systemctl start sas-viya-httpproxy-default

If you do not have access to sas_viya_playbook/inventory.ini for reference, the command can be executed on all machines that have the file /etc/init.d/sas-viya-httpproxy-default.

Or you can use Ansible:

ansible httpproxy -b -m service -a "service=sas-viya-httpproxy-default state=started"

14. Verify that the sas-viya-httpproxy-default service started the HTTPD or Apache2 service using curl on the host(s) assigned to the HTTPproxy group in inventory.ini:

curl -sk http://localhost

Or you can use Ansible:

ansible httpproxy -m shell -a "curl -sk https://localhost"

The output should be an HTTP 302 redirect similar to this:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://localhost/SASDrive/">here</a>.</p>
</body></html>
 

In case of problems, review the output of "sudo systemctl status httpd" and logs in /var/log/httpd (for RHEL) or "sudo systemctl status apache2" and logs in /var/log/apache2 (for SUSE).

Start the Remaining SAS Viya Services

If all of these services start without problems, you can start the remaining services using "sudo /etc/init.d/sas-viya-all-services start" or the Viya-ARK viya-services-start.yml playbook. The /etc/init.d/sas-viya-all-services script starts only services on the local machine, so the script needs to be executed manually on each SAS Viya machine if you use this method in a multi-machine deployment.

Differences in SAS Viya 3.4

The same strategy applies to SAS Viya 3.4, except there are different services for SAS Infrastructure Data Server (PostgreSQL) in that version. In SAS Viya 3.4, the sas-viya-sasdatasvrc-postgres service can be used to start and obtain status for the sasdatasvrc service.  

Here is an example:

sudo systemctl start sas-viya-sasdatasvrc-postgres
sudo systemctl status sas-viya-sasdatasvrc-postgres