Data formats

This page documents the data formats consumed and produced by PETGEM: the text and HDF5 inputs read by utils/preprocess.py, the unified input bundle it writes, and the responses file written by the kernel. All HDF5 files use the PETSc/h5py layout; the Python helpers petgem.readBundle and petgem.readResponses provide convenient readers.

Conductivity table (sigmas.txt)

A whitespace-delimited text table of per-material conductivity, passed via -sigma_file (relative to -case_dir). The row index is the 0-based material id. For a Gmsh mesh this maps to the physical group as material_id = gmsh:physical - 1; for a VTK mesh the distinct cell-data region codes are mapped to rows in ascending order (row i = i-th smallest code). See Mesh generation for the per-format mapping. # comments and blank lines are allowed (commas are also accepted, so legacy comma-separated tables still load).

# sigma_x sigma_y sigma_z [fixed]
0.1 0.1 0.1 1     # material 0 - held fixed during inversion (e.g. air)
1.0 1.0 1.0 0     # material 1 - invertable
2.0 2.0 2.0       # material 2 - 'fixed' column omitted -> defaults to 0
  • sigma_x sigma_y sigma_z: per-axis conductivity (S/m). Use equal values for an isotropic material.

  • fixed (optional 4th column): 1 marks the material as held fixed during inversion (its gradient is zeroed). Forward modeling ignores it.

Sources

Both modes use a single transmitter format: one row per transmitter, 8 fields (frequency on every row). Forward modeling (-source_filename) repeats one frequency across its transmitters; inverse modeling (-inv_source_filename) carries one row per (frequency, dipole) pair:

freq x_pos y_pos z_pos current length dip_angle azimuth_angle
...

The fields are: frequency (Hz), position x y z (m), current (A), length (m), dip_angle and azimuth_angle (degrees). The legacy forward layout (a lone freq line followed by 7-field rows) is still accepted and converted to this form on read.

Receivers (-receiver_filename)

A text file of receiver positions, one Cartesian point per row. Columns may be separated by whitespace or commas (# comments and blank lines allowed), so receiver lists exported from MATLAB or other tools load without conversion:

x y z
...

Observed data (-observed_filename, inverse only)

The observed electric field used as the inversion target, in either of two forms (preprocess.py detects the format by extension):

HDF5 (.h5 / .hdf5):

  • /Ex: complex [N_freq, N_recv] dataset (complex128, stored as an {r, i} compound). Row order matches the inverse source frequencies; column order matches the receiver list.

  • /frequencies: the frequency of each row (Hz).

  • optional root attribute error_level (the relative noise level).

Raw text (invEx.dat, MATLAB-style) - parsed inline by preprocess.py, so no separate conversion step is needed:

freq_label  Re(Ex_1) Im(Ex_1)  Re(Ex_2) Im(Ex_2)  ...  Re(Ex_N) Im(Ex_N)

one row per frequency. The leading label is dropped (frequencies come from the inverse sources); the noise level is supplied with -error_level (or left to the kernel default). The standalone utils/convert_invex_to_hdf5.py can still produce a reusable HDF5 from such a file, but is no longer required.

Input bundle (-input_filename, default input.h5)

utils/preprocess.py assembles a single unified HDF5 bundle that the kernel reads in full. Both modes share the same skeleton - mesh, model, order, receivers, and /sources - and inverse runs simply add the two groups that have no forward counterpart (/observed and /inv_meta). Each mode populates only the fields it needs.

Group / dataset

Mode

Contents

DMPlex topology + model_data

both

Mesh (PETSc DMPlex layout) and per-cell conductivity, written by the preprocess

/nord

both

Polynomial order (length-1 vector)

/receivers

both

Receiver positions, flattened [N_recv * 3]

/sources/*

both

Transmitters, one entry per row: freq, position, current, length, dipAngle, azimuthAngle (per entry). Forward repeats one frequency; inverse carries one row per (frequency, dipole).

/observed/Ex

inverse

Observed field; the @error_level attribute holds the relative noise level

/inv_meta/fixed_materials

inverse

0-based material ids held fixed (int32)

petgem.readBundle(path) returns a dict with receivers ([N_recv, 3]), nord, frequency (first transmitter), and sources ([N_src, 8]: freq x y z current length dipAngle azimuthAngle). The DMPlex mesh and model_data are kept in the file but not returned.

Responses file

The forward kernel writes a single unified HDF5 file containing every transmitter, named <output_filename>.h5 (e.g. responses_p1.h5). All six field components are produced through PETSc’s native HDF5 viewer; the output Vecs are parallel on the kernel communicator, so writes are collective MPI-IO when PETSc is linked against a parallel HDF5 build (no rank-0 gather).

Layout:

/                              root attrs (provenance + run-wide values)
/sources/src1/                 per-source attrs
/sources/src1/fields/Ex        complex PETSc Vec, length N_recv
/sources/src1/fields/Ey
/sources/src1/fields/Ez
/sources/src1/fields/Hx
/sources/src1/fields/Hy
/sources/src1/fields/Hz
/sources/src2/                 (one such group per transmitter)
...

Root attributes:

  • petgem_version, input_filename, date, nord, mpi_tasks (provenance).

  • num_sources — number of transmitter groups stored under /sources/.

  • frequency — operating frequency (Hz); single-frequency forward runs carry one shared value here in addition to the per-source attribute.

Per-source group attributes (/sources/src{k}): frequency, x_pos, y_pos, z_pos, current, length, dip_angle, azimuth_angle.

Python readers:

  • petgem.readResponses(path, source=1) returns a per-source dict (Ex..Hz arrays + source and provenance attribute dicts), preserving the shape used by pre-refactor callers.

  • petgem.readAllResponses(path) returns {'provenance': ..., 'num_sources': N, 'sources': {1: {...}, 2: {...}, ...}} with each per-source entry shaped like readResponses.

The per-case postprocess.py scripts use these readers; see tests/cases/csem_model/postprocess.py (single-source reference compare) and cicero_models/model_1/postprocess.py (multi-source plotting).