"Fetch error: java.lang.OutOfMemoryError: Java heap space" occurs when using the CData JDBC driver for Databricks with SAS/ACCESS® Interface to Spark


When unloading a large amount of data from Databricks to SAS, the Java Virtual Machine (JVM) might run out of memory, which might cause an error message similar to the following to occur:

81   data _null_;
82     set dbricks.large_table;
84   run;
ERROR: Fetch error: java.lang.OutOfMemoryError: Java heap space
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2023337 observations read from the data set DBRICKS.LARGE_TABLE

This issue occurs due to a memory allocation issue in the CData JDBC driver for Databricks (cdata-jdbc-databricks-23.0.8806.0.jar). This issue can occur when you use the driver version 23.0.8806.0, which is available on the SAS® Viya® platform.

You can reference this driver in a SAS/ACCESS Interface to Spark LIBNAME statement by either specifying platform=databricks without a driverClass option or by specifying the CData Java class directly (driverClasss='cdata.jdbc.databricks.DatabricksDriver'). 

Workaround

The workaround for this issue is to use the Simba Spark JDBC driver for Databricks with the driverClass='com.databricks.client.jdbc.Driver' LIBNAME option. This driver is available from the databricks.com website.