The SAS High-Performance Analytics Environment provides the framework for the distributed SAS® LASR™ Analytic Server, as well as SAS® High-Performance procedures running in distributed mode, such as PROC HPDS2, PROC HPSUMMARY, and PROC HPSAMPLE. The SAS High-Performance Analytics Environment software is contained in the TKGrid directory on each node in the distributed environment. Some SAS administrators might be more familiar with the directory name TKGrid than with the proper name SAS High-Performance Analytics Environment.
The processes that make up a distributed SAS LASR Analytic Server or high-performance procedure are started on each node using Secure Shell (SSH). The SSH connections must be established without the need for the user to enter a password. This is often referred to as passwordless SSH. There are many ways to facilitate secure passwordless SSH logins, such as using public-private key pairs or GSSAPIAuthentication (based on Kerberos).
If SAS LASR Analytic Server or high-performance procedures are executed by users who cannot perform passwordless SSH logins from SAS to the SAS High-Performance Analytics Environment and between all nodes in the SAS High-Performance Analytics Environment, the errors are shown in log files. These are shown below.
Example 1: Problem occurs when starting LASR with PROC LASR CREATE or when executing a high-performance procedure such as PROC HPDS2
Example 2: Problem occurs when attempting to load data to an already started SAS LASR Analytic Server with PROC LASR ADD
Example 3: Problem occurs when starting the SAS® Visual Analytics LASR Monitor
Example 4: Problem occurs when using PROC IMSTAT or LIBNAME SASIOLA
When a SAS procedure (such as PROC LASR or high-performance procedures like PROC HPDS2) runs in distributed mode, SAS uses SSH to connect to the SAS High-Performance Analytics/TKGrid head node. The SAS High-Performance Analytics/TKGrid head node then uses SSH to connect to the worker nodes. Finally, one random node uses SSH to connect to the other nodes to start processes needed for the SAS procedure.
Because of this randomly chosen node, we need passwordless SSH authentication between all SAS High-Performance Analytics/TKGrid nodes.
It is often assumed that passwordless SSH authentication from the head node to all worker nodes is enough to run distributed SAS procedures. It is not. Every node must be able to make a passwordless connection to every other node.
Use the steps below to help you determine whether passwordless SSH authentication is working as needed between all nodes.
ssh node0.example.com
You should authenticate without any prompt for a password.
/opt/TKGrid/bin/simsh /opt/TKGrid/bin/simsh hostname
This also needs to complete without any errors or password prompts.
Successful output looks similar to this:
The node in the first column connected to the node in the second column. And the third column is the output of the hostname command on the node that we connected to. This is a quick way to confirm that SSH can be used to connect from each node to all other nodes without supplying a password.
If you are unable to execute these steps without being prompted for a password or other errors, proceed to the steps below.
As noted above, passwordless SSH authentication can be performed in various ways. However, the most common way is by using SSH key pairs. An example of setting up keys to facilitate passwordless SSH login is below.
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
for i in `cat /etc/gridhosts`; do scp -r ~/.ssh $i:/home/$USER; done
Enter a password when scp connects to each node listed in /etc/gridhosts.
After these steps are complete, the steps in the "Identifying the Problem" section, above, should succeed without any password prompts or errors. Once this is confirmed, retry the previously failing SAS LASR Analytic Server or SAS High-Performance Analytics code or task.
In some environments, errors might continue even after you have verified successful passwordless SSH authentication between all nodes. This is most likely to occur in environments in which non-RSA SSH key types are used, key pairs are not stored in $HOME/.ssh, or SSH authentication is performed using GSSAPIAuthentication (based on Kerberos). In these cases, the environment variable GRIDRSHCOMMAND might be needed to specify an external SSH executable, instead of using the SAS built-in SSH module. The environment variable should point to the SSH executable that you want to use and pass options that suppress banner messages and warnings about host key validation, as shown below.
Example of setting GRIDRSHCOMMAND with an OPTIONS statement in SAS code:
Example of setting GRIDRSHCOMMAND in SASFoundation/9.4/sasv9_local.cfg (useful for applying the variable globally and for environments where SAS code is auto-generated, such as SAS Visual Analytics):
If GRIDRSHCOMMAND is needed with the SAS Visual Analytics LASR Monitor, export the variable in LevX/Applications/SASVisualAnalytics/HighPerformanceConfiguration/LASRMonitor.sh: