executorlib.executor.slurm.create_slurm_executor

executorlib.executor.slurm.create_slurm_executor#

executorlib.executor.slurm.create_slurm_executor(max_workers: int | None = None, max_cores: int | None = None, cache_directory: str | None = None, executor_kwargs: dict | None = None, pmi_mode: str | None = None, hostname_localhost: bool | None = None, block_allocation: bool = False, init_function: ~typing.Callable | None = None, log_obj_size: bool = False, wait: bool = True, validator: ~typing.Callable = <function validate_resource_dict>, restart_limit: int = 0) OneProcessTaskScheduler | BlockAllocationTaskScheduler[source]#

Create a SLURM executor

Parameters:
  • max_workers (int) – for backwards compatibility with the standard library, max_workers also defines the number of cores which can be used in parallel - just like the max_cores parameter. Using max_cores is recommended, as computers have a limited number of compute cores.

  • max_cores (int) – defines the number cores which can be used in parallel

  • cache_directory (str, optional) – The directory to store cache files. Defaults to “executorlib_cache”.

  • executor_kwargs (dict) –

    A dictionary of arguments required by the executor. With the following keys: * cores (int): number of MPI cores to be used for each function call * threads_per_core (int): number of OpenMP threads to be used for each function call * gpus_per_core (int): number of GPUs per worker - defaults to 0 * cwd (str): current working directory where the parallel python task is executed * cache_key (str): Rather than using the internal hashing of executorlib the user can

    provide an external cache_key to identify tasks on the file system.

    • cache_directory (str): The directory to store cache files.

    • num_nodes (int): number of compute nodes used for the evaluation of the Python function.

    • exclusive (bool): boolean flag to reserve exclusive access to selected compute nodes -

      do not allow other tasks to use the same compute node.

    • error_log_file (str): path to the error log file, primarily used to merge the log of

      multiple tasks in one file.

    • run_time_max (int): the maximum time the execution of the submitted Python function is

      allowed to take in seconds.

    • priority (int): the queuing system priority assigned to a given Python function to

      influence the scheduling.

    • slurm_cmd_args (list): Additional command line arguments for the srun call (SLURM only)

  • pmi_mode (str) – PMI interface to use (OpenMPI v5 requires pmix) default is None

  • hostname_localhost (boolean) – use localhost instead of the hostname to establish the zmq connection. In the context of an HPC cluster this essential to be able to communicate to an Executor running on a different compute node within the same allocation. And in principle any computer should be able to resolve that their own hostname points to the same address as localhost. Still MacOS >= 12 seems to disable this look up for security reasons. So on MacOS it is required to set this option to true

  • block_allocation (boolean) – To accelerate the submission of a series of python functions with the same resource requirements, executorlib supports block allocation. In this case all resources have to be defined on the executor, rather than during the submission of the individual function.

  • init_function (None) – optional function to preset arguments for functions which are submitted later

  • log_obj_size (bool) – Enable debug mode which reports the size of the communicated objects.

  • wait (bool) – Whether to wait for the completion of all tasks before shutting down the executor.

  • validator (callable) – A function to validate the resource_dict.

  • restart_limit (int) – The maximum number of restarting worker processes.

Returns:

InteractiveStepExecutor/ InteractiveExecutor