Sparse Real-Space Matrix in FHI-aims

Overview

FHI-aims stores sparse matrices (Hamiltonian and overlap) using a compressed format similar to CSR (Compressed Sparse Row) due to the excellent locality of the real-space representation. The sparse storage significantly reduces memory usage by only storing non-zero elements.

Basic CSR Format Background

Standard CSR Components

V: Array containing all non-zero elements
col_index: Column indices of non-zero elements
row_index: Array indicating where each row starts in the V array

CSR Example

For a 4×4 matrix with 4 non-zero elements (array index starts from 0 in this section):

[5 0 0 0]
[0 8 0 0]
[0 0 3 0]
[0 6 0 0]

CSR representation:

V = [5, 8, 3, 6] (non-zero values)
col_index = [0, 1, 2, 1] (column indices of non-zero elements)
row_index = [0, 1, 2, 3, 4] (cumulative count: row i has row_index[i+1] - row_index[i] elements)

FHI-aims Specific Implementation

Key Differences from Standard CSR

PBC Consideration: Due to periodic boundary conditions, the row_index definition is slightly modified
3D Index Structure: Uses 3D arrays to handle cell and basis indexing simultaneously
Hamiltonian-Specific Storage: index_hamiltonian array structure

Data Structure Components

For Hamiltonian Matrix Storage

Here are the FORTRAN variables used in FHI-aims for storing the Hamiltonian:

hamiltonian: Array containing non-zero matrix elements (equivalent to V in CSR)
column_index_hamiltonian: Column indices of stored elements
index_hamiltonian: 3D array with dimensions (2, n_cells, n_basis) * First dimension (2): stores start and end indices for each row * Second dimension: cell index * Third dimension: basis function index

Index Mapping

Cell and Basis Organization

\(m\) cells, and \(p\) basis functions per cell
Full Hamiltonian would be \((m \times p) \times (m \times p)\) dimensions
Due to PBC, only need to store \(m \times p^2\) elements (cells are equivalent under translation)

Index Structure Example

Array index starts from 1 in this section to follow FORTRAN standard.

For period=3 (so 3 cells in total), and 2 basis functions per cell (think of graphene but in 1D):

| b1 b2 | b1 b2 | b1 b2 |  (b -> basis)
|   c3  |   c1  |   c2  |  (c -> cell)

The variable cell_index in machine storage is:

cell_index =
 0  0  0  # cell 1: central
 1  0  0  # cell 2: on the right
-1  0  0  # cell 3: on the left

The full Hamiltonian is (index format (cell, basis)):

       (1,1) (1,2)
(1,1)     v1    v2  # index_hamiltonian(:,1,1) = [1,2]
(1,2)    v2*     0  # index_hamiltonian(:,1,2) = [0,-1]
(2,1)      0    v3  # index_hamiltonian(:,2,1) = [3,3]
(2,2)      0    v4  # index_hamiltonian(:,2,2) = [4,4]
(3,1)      0     0  # index_hamiltonian(:,3,1) = [0,-1]
(3,2)    v3*   v4'  # index_hamiltonian(:,3,2) = [5,5]

So the variable hamiltonian in machine storage is:

v2* is the complex conjugation of v2
v2* and v3* are ignored because of symmetry, and for v4' although it is equal to v4*, it should still be stored because of being in the upper triangle part

hamiltonian = [v1, v2, v3, v4, v4']

And the variable column_index_hamiltonian in machine storage is (it represents the column index of hamiltonian):

column_index_hamiltonian = [1, 2, 2, 2, 2]

And the variable index_hamiltonian is:

index_hamiltonian(1,:,:) =
        basis_1 basis_2
cell_1        1       0
cell_2        3       4
cell_3        0       5

index_hamiltonian(2,:,:) =
        basis_1 basis_2
cell_1        2      -1
cell_2        3       4
cell_3       -1       5

The following form makes it easier:

index_hamiltonian(:,1,1) = [1,2]    # row 1 cell_1 basis_1, elements from index 1 to 2
index_hamiltonian(:,1,2) = [0,-1]   # row 2 cell_1 basis_2, no elements (empty)
index_hamiltonian(:,2,1) = [3,3]    # row 3 cell_2 basis_1, element at index 3
index_hamiltonian(:,2,2) = [4,4]    # row 4 cell_2 basis_2, element at index 4
index_hamiltonian(:,3,1) = [0,-1]   # row 5 cell_3 basis_1, no elements (empty)
index_hamiltonian(:,3,2) = [5,5]    # row 6 cell_3 basis_2, element at index 5

Special Conventions

Empty Rows

If a row has no matrix elements to store: index_hamiltonian(:,m,n) = (0,-1)
This indicates an empty row with no non-zero elements

Symmetry Considerations

In the main cell (0, 0, 0)
- due to symmetry: \(\langle c_0 b_i | A | c_0 b_j \rangle = \langle c_0 b_j | A | c_0 b_i \rangle^*\)
- we only store the upper half of the matrix, that’s why v2* is ignored
For two “conjugated” cell I and J which cell_index(I,:) = - cell_index(J,:)
- because of translation symmetry: \(\langle c_0 b_i | A | c_I b_j \rangle = \langle c_J b_i | A | c_0 b_j \rangle\)
- only the upper part, like \(\langle c_0 b_i | A | c_I b_j \rangle\) where \(i<j\)
  - (so this is U in each sub-matrix between cell 0 and cell \(I\))
- and the lower part is the complex conjugation, like the case for v3*
- The reason for storing v4' is that we have \(i=j\) and the diagonal part will be stored since it locates in the upper triangle

Data Access Pattern

To extract elements for a specific row: use the index range specified in index_hamiltonian
Extract hamiltonian[start:end+1] and corresponding column_index_hamiltonian[start:end+1]

Special row

The last row in cell_index is not used
- actually it is 999999999 999999999 999999999
Just skip it, as you can see in the following example code

Use Case in FORTRAN

do i_cell_row = 1, n_cells_in_hamiltonian-1   ! yes, "-1".
    do i_basis_row = 1, n_basis
    i_index_first = index_hamiltonian(1, i_cell_row, i_basis_row)
    i_index_last = index_hamiltonian(2, i_cell_row, i_basis_row)
    do i_index = i_index_first, i_index_last
        i_basis_col = column_index_hamiltonian(i_index)
        ! Use:
        !    hamiltonian(i_index, i_spin)
        !    density_matrix_sparse(i_index)
        !    and i_basis_row, i_cell_row, i_basis_col
        ! or any combination of
        !    (i_basis_row, i_loc_cell_row), &
        !    & (i_basis_col, i_loc_cell_col),
        ! with
        !    i_cell_row == &
        !    & position_in_hamiltonian(i_loc_cell_row, i_loc_cell_col)
    end do
    end do
end do

Key Points for Parser Implementation

Index Convention: Check if FHI-aims uses 0-based or 1-based indexing
Empty Row Handling: Properly handle (0,-1) markers for empty rows
PBC Structure: Account for the cell-basis double indexing system
Symmetry: Determine if full or half-storage is used for symmetric matrices
- AND: upper matrix in FORTRAN is lower matrix in python, because of array indexing sequence
Data Types: Verify floating-point precision and integer types used
FORTRAN / Python indexing: FORTRAN starting from 1 and include both start and end, but python start from 0 and only include the start