A containers.Map
object containing Spark configuration
properties as key-value pairs.
When deploying to a Hadoop YARN cluster, set the value
for prop
with the appropriate Spark configuration
properties as key-value pairs. The precise set of Spark configuration
properties vary from one deployment scenario to another, based on
the deployment cluster environment. Users must verify the Spark setup
with a system administrator to use the appropriate configuration properties.
See the table for commonly used Spark properties. For a full
set of properties, see the latest Spark documentation.
Running Spark on YARN
Property Name (Key) | Default (Value) | Description |
---|
spark.executor.cores | 1 | The number of cores to use on each executor. For
YARN and Spark standalone mode only. In Spark standalone
mode, setting this parameter allows an application to run multiple
executors on the same worker, provided that there are enough cores
on that worker. Otherwise, only one executor per application runs
on each worker. |
spark.executor.instances | 2 | The number of executors.
Note This property is incompatible with spark.dynamicAllocation.enabled .
If both spark.dynamicAllocation.enabled and spark.executor.instances are
specified, dynamic allocation is turned off and the specified number
of spark.executor.instances is used.
|
spark.driver.memory |
| Amount of memory to use for the driver process. If you get
any out of memory errors while using tall/gather ,
consider increasing this value. |
spark.executor.memory |
| Amount of memory to use per executor process. If you get
any out of memory errors while using tall/gather ,
consider increasing this value. |
spark.yarn.executor.memoryOverhead |
| The amount of off-heap memory (in MBs) to be allocated per executor. If you
get any out of memory errors while using tall/gather ,
consider increasing this value. |
spark.dynamicAllocation.enabled | false | This option integrates Spark with the YARN resource
management. Spark initiates as many executors as possible given
the executor memory requirement and number of cores. This property
requires that the cluster be set up. Setting this property
to true specifies whether to use dynamic resource
allocation, which scales the number of executors registered with this
application up and down based on the workload. This property
requires spark.shuffle.service.enabled to be set.
The following configurations are also relevant: spark.dynamicAllocation.minExecutors , spark.dynamicAllocation.maxExecutors ,
and spark.dynamicAllocation.initialExecutors |
spark.shuffle.service.enabled | false | Enables the external shuffle service. This service preserves
the shuffle files written by executors so the executors can be safely
removed. This must be enabled if spark.dynamicAllocation.enabled is
set to true . The external shuffle service must
be set up in order to enable it. |
MATLAB Specific Properties
Property Name (Key) | Default (Value) | Description |
---|
spark.matlab.worker.debug | false | For use in standalone/interactive mode only. If set to true,
a Spark deployable MATLAB application executed within the MATLAB desktop
environment, starts another MATLAB session as worker, and will
enter the debugger. Logging information is directed to log_<nbr>.txt . |
spark.matlab.worker.reuse | true | When set to true , a Spark executor
pools workers and reuses them from one stage to the next. Workers
terminate when the executor under which the workers are running terminates. |
spark.matlab.worker.profile | false | Only valid when using a session of MATLAB as a worker.
When set to true , it turns on the MATLAB Profiler
and generates a Profile report that is saved to the file profworker_<split_index>_<socket>_<worker
pass>.mat . |
spark.matlab.worker.numberOfKeys | 10000 | Number of unique keys that can be held in a containers.Map object
while performing *ByKey operations before map data
is spilled to a file. |
spark.matlab.executor.timeout | 600000 | Spark executor timeout in milliseconds. Not applicable
when deploying tall arrays. |
Monitoring and Logging
Property Name (Key) | Default (Value) | Description |
---|
spark.history.fs.logDirectory | file:/tmp/spark-events | Directory that contains application event logs to be
loaded by the history server. |
spark.eventLog.dir | file:///tmp/spark-events | Base directory in which Spark events are logged,
if spark.eventLog.enabled is true .
Within this base directory, Spark creates a sub directory for
each application, and logs the events specific to the application
in this directory. You can set this to a unified location like an HDFS™ directory
so history files can be read by the history server. |
spark.eventLog.enabled | false | Whether to log Spark events. This is useful for
reconstructing the web UI after the application has finished. |