ASE driver

Contributed by George Trenins

Warning

The procedure outlined below was only tested for ase-3.25.0.

The python script for driving the simulation and all accompanying input files can be downloaded here. Below I explain the key parts of the main() section in the python script.

Band setup

The nudged elastic band (NEB) method locates the minimum-energy path between two configurations. We assume that these have already been generated and can be read from the files reactant.in and product.in. The initial setup is standard and follows ASE documentation.

from ase.io import read
from ase import Atoms
from ase.mep import NEB

reactant = "reactant.in"
product = "product.in"
num_images = 4   # number of images between the end-points
initial: Atoms = read(reactant, format="aims")
final: Atoms = read(product, format="aims")
configs: list[Atoms] = [initial] + [initial.copy() for i in range(num_images)] + [final]
band: NEB = NEB(configs, parallel=True, method="improvedtangent")
band.interpolate(method='linear', mic=True, apply_constraint=True)

Socket interface

By far the most efficient way for ase and FHI-aims to communicate is via sockets. Setting this up requires some care, especially if using multiple nodes.

Driver hostname

In what follows, we will need to advertise the actual hostname of the driver node (node running ase) to the clients. When using the Slurm workload manager, this can be obtained as follows:

import os, subprocess
nodelist = os.environ['SLURM_NODELIST']
nodes = subprocess.check_output(
    ['scontrol', 'show', 'hostnames', nodelist],
    text=True
).split()
driver_host = nodes[0]

Hint

If everything is running on the same node you may simply use driver_host = "localhost".

Launcher scripts

There are several ways to configure how the calculator gets launched. I recommend manually instantiating a “profile”:

from ase.calculators.aims import Aims, AimsProfile
profile = AimsProfile("/path/to/launcher_script.sh")
aims = Aims(profile=profile, ...)

Here, launcher_script.sh should be an executable file that launches the calculator for the image (bead) to which it is attached. We will generate the launcher scripts from within the python script driving the calculation and tailor their contents image by image, as I explain below.

Port assignment

For a typical NEB optimization, I recommend using TCP/IP sockets (as opposed to UNIX sockets), since the communication overhead is certain to be negligible compared to the time needed for the ab initio calculation, and since this is the more flexible option. To open such a socket, you need to assign an integer port number that is not in use by any other applications. See the get_port() function implemented in the FHI-aims software package in utilities/get_free_port.py for an example of how to generate a suitable port number.

Calculator initialization

The exterior images (reactant and product) are treated differently to the interior images in NEB optimization, so we consider the two separately.

Reactant and product

These structures do not change over the course of the optimization. For the most basic NEB optimization method (method = "aseneb") these images do not need a calculator attached at all. All other methods require the potential energies, but not the forces. Since the structures do not change, it is sufficient to compute the energies once and then close the calculator, best accomplished using a context manager. The computed energies are cached and persist after the calculator is closed. At this stage, the contents of launcher_script.sh for these images can be something like

srun /path/to/aims.VERSION.scalapack.mpi.x < /dev/null > aims.out

allowing aims to utilise all the available CPUs, since we compute the energies first for the reactant and then for the product, so competition for resources is not an issue.

for i in [0, num_images + 1]:
    image: Atoms = band.images[i]
    target: Path = Path(f"image{i:02d}")
    port = get_port(host = driver_host)
    cmd = command_for_exterior_images      # e.g., 'srun /path/to/aims.VERSION.scalapack.mpi.x < /dev/null > aims.out'
    launcher = wd / f"_launcher{i:02d}.sh" # separate launcher for every images
    write_launcher(launcher, cmd)          # see below
    profile = AimsProfile(str(launcher))
    aims = Aims(
        profile=profile,
        directory=target,
        ...,                                # species_dir, kgrid, etc.
        use_pimd_wrapper=(driver_host, port),
    )
    # use context manager to open and close socket
    with SocketIOCalculator(aims, log=sys.stdout, port=port) as calc:
        image.calc = calc
        # no need to assign the energy, cached by image
        image.get_potential_energies()

Interior images

The energies and forces on the interior images are recomputed at every step of the NEB optimization, therefore we need to:

  1. Launch several calculators in parallel and keep them running until the optimization is done.

  2. Ensure that the available resources are evenly distributed.

The last point in particular has presented some unexpected difficulties (at least on ADA). In my tests using 2 nodes (72 cores each), when launching four MPI processes (36 tasks each), slurm would routinely assign three processes to one node, and only one process to the other. The following did not fix the issue:

  • waiting a few seconds between launching different client processes

  • using the slurm --exact --exclusive flag combination

  • using slurm --distribution=cyclic

A robust approach is to manually assign nodes to the different images. The example I give below assumes that the calculator for a single image runs on only one node (not necessarily utilising all the CPUs), but can be readily extended to multi-node jobs. In this case, the launcher command is something like

srun -N 1 -n 36 --exact --exclusive /path/to/aims.VERSION.scalapack.mpi.x < /dev/null > aims.out

and the python script goes as follows:

from itertools import cycle
import time
node_cycle = cycle(nodes)  # `nodes` defined in the 'Driver hostname' section
for i,(image,node) in enumerate(zip(band.images[:-1], node_cycle)):
    if i == 0: continue    # reactant already taken care of
    target: Path = Path(f"image{i:02d}")
    launcher = wd / f"_launcher{i:02d}.sh"
    port = get_port(host = driver_host)
    cmd = command_for_exterior_images  # e.g., srun -N 1 -n 36 --exact --exclusive /path/to/aims.VERSION.scalapack.mpi.x < /dev/null > aims.out
    write_launcher(launcher, cmd, extra=f"--nodelist={node}")  # force round-robin
    profile = AimsProfile(str(launcher))
    aims = Aims(...)  # same as before
    calc = SocketIOCalculator(calc=aims, port=port, log=sys.stdout)
    # Manually launch the client and the server so that get_port()
    # has up-to-date info on what ports are available in the next
    # iteration of the for loop
    calc.server = calc.launch_server()
    proc = calc.launch_client(image, properties=["energy", "forces"],
                              port=calc._port,
                              unixsocket=calc._unixsocket)
    time.sleep(1.0) # optional
    calc.server.proc = proc
    image.calc = calc

The function write_launcher() used in this and preceding section is

from typing import Optional
import stat
def write_launcher(
        filepath: Path,
        cmd: str,
        extra: Optional[str] = None) -> None:
    if extra is not None:
        cmd_lst: list[str] = cmd.split()
        cmd_lst.insert(1, extra)
        cmd: str = ' '.join(cmd_lst)
    with open(filepath, 'w') as f:
        f.write(f"#!/bin/bash\n\n{cmd}\n")
    # Get current permissions
    mode = filepath.stat().st_mode
    # Add execute permission for user, group, and others
    filepath.chmod(mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)
    return

Note

Include #SBATCH --hint=nomultithread in the Slurm submission script to disable hyperthreading, ensuring each MPI task runs on a full physical core and achieves full CPU utilization.