Data formats
This page documents the data formats consumed and produced by PETGEM: the
text and HDF5 inputs read by utils/preprocess.py, the unified input bundle
it writes, and the responses file written by the kernel. All HDF5 files use the
PETSc/h5py layout; the Python helpers
petgem.readBundle and petgem.readResponses provide convenient readers.
Conductivity table (sigmas.txt)
A whitespace-delimited text table of per-material conductivity, passed via
-sigma_file (relative to -case_dir). The row index is the 0-based
material id. For a Gmsh mesh this maps to the physical group as
material_id = gmsh:physical - 1; for a VTK mesh the distinct cell-data
region codes are mapped to rows in ascending order (row i = i-th
smallest code). See Mesh generation for the per-format mapping. # comments
and blank lines are allowed (commas are also accepted, so legacy
comma-separated tables still load).
# sigma_x sigma_y sigma_z [fixed]
0.1 0.1 0.1 1 # material 0 - held fixed during inversion (e.g. air)
1.0 1.0 1.0 0 # material 1 - invertable
2.0 2.0 2.0 # material 2 - 'fixed' column omitted -> defaults to 0
sigma_x sigma_y sigma_z: per-axis conductivity (S/m). Use equal values for an isotropic material.fixed(optional 4th column):1marks the material as held fixed during inversion (its gradient is zeroed). Forward modeling ignores it.
Sources
Both modes use a single transmitter format: one row per transmitter, 8
fields (frequency on every row). Forward modeling (-source_filename)
repeats one frequency across its transmitters; inverse modeling
(-inv_source_filename) carries one row per (frequency, dipole) pair:
freq x_pos y_pos z_pos current length dip_angle azimuth_angle
...
The fields are: frequency (Hz), position x y z (m), current (A),
length (m), dip_angle and azimuth_angle (degrees). The legacy
forward layout (a lone freq line followed by 7-field rows) is still
accepted and converted to this form on read.
Receivers (-receiver_filename)
A text file of receiver positions, one Cartesian point per row. Columns may be
separated by whitespace or commas (# comments and blank lines allowed),
so receiver lists exported from MATLAB or other tools load without conversion:
x y z
...
Observed data (-observed_filename, inverse only)
The observed electric field used as the inversion target, in either of two
forms (preprocess.py detects the format by extension):
HDF5 (.h5 / .hdf5):
/Ex: complex[N_freq, N_recv]dataset (complex128, stored as an{r, i}compound). Row order matches the inverse source frequencies; column order matches the receiver list./frequencies: the frequency of each row (Hz).optional root attribute
error_level(the relative noise level).
Raw text (invEx.dat, MATLAB-style) - parsed inline by preprocess.py,
so no separate conversion step is needed:
freq_label Re(Ex_1) Im(Ex_1) Re(Ex_2) Im(Ex_2) ... Re(Ex_N) Im(Ex_N)
one row per frequency. The leading label is dropped (frequencies come from the
inverse sources); the noise level is supplied with -error_level (or left to
the kernel default). The standalone utils/convert_invex_to_hdf5.py can still
produce a reusable HDF5 from such a file, but is no longer required.
Input bundle (-input_filename, default input.h5)
utils/preprocess.py assembles a single unified HDF5 bundle that the
kernel reads in full. Both modes share the same skeleton - mesh, model, order,
receivers, and /sources - and inverse runs simply add the two groups
that have no forward counterpart (/observed and /inv_meta). Each mode
populates only the fields it needs.
Group / dataset |
Mode |
Contents |
|---|---|---|
DMPlex topology + |
both |
Mesh (PETSc DMPlex layout) and per-cell conductivity, written by the preprocess |
|
both |
Polynomial order (length-1 vector) |
|
both |
Receiver positions, flattened |
|
both |
Transmitters, one entry per row: freq, position, current, length, dipAngle, azimuthAngle (per entry). Forward repeats one frequency; inverse carries one row per (frequency, dipole). |
|
inverse |
Observed field; the |
|
inverse |
0-based material ids held fixed (int32) |
petgem.readBundle(path) returns a dict with receivers ([N_recv, 3]),
nord, frequency (first transmitter), and sources ([N_src, 8]:
freq x y z current length dipAngle azimuthAngle). The DMPlex mesh and
model_data are kept in the file but not returned.
Responses file
The forward kernel writes a single unified HDF5 file containing every
transmitter, named <output_filename>.h5 (e.g. responses_p1.h5). All
six field components are produced through PETSc’s native HDF5 viewer; the
output Vecs are parallel on the kernel communicator, so writes are
collective MPI-IO when PETSc is linked against a parallel HDF5 build (no
rank-0 gather).
Layout:
/ root attrs (provenance + run-wide values)
/sources/src1/ per-source attrs
/sources/src1/fields/Ex complex PETSc Vec, length N_recv
/sources/src1/fields/Ey
/sources/src1/fields/Ez
/sources/src1/fields/Hx
/sources/src1/fields/Hy
/sources/src1/fields/Hz
/sources/src2/ (one such group per transmitter)
...
Root attributes:
petgem_version,input_filename,date,nord,mpi_tasks(provenance).num_sources— number of transmitter groups stored under/sources/.frequency— operating frequency (Hz); single-frequency forward runs carry one shared value here in addition to the per-source attribute.
Per-source group attributes (/sources/src{k}): frequency, x_pos,
y_pos, z_pos, current, length, dip_angle,
azimuth_angle.
Python readers:
petgem.readResponses(path, source=1)returns a per-source dict (Ex..Hzarrays +sourceandprovenanceattribute dicts), preserving the shape used by pre-refactor callers.petgem.readAllResponses(path)returns{'provenance': ..., 'num_sources': N, 'sources': {1: {...}, 2: {...}, ...}}with each per-source entry shaped likereadResponses.
The per-case postprocess.py scripts use these readers; see
tests/cases/csem_model/postprocess.py (single-source reference compare)
and cicero_models/model_1/postprocess.py (multi-source plotting).