============
Data formats
============
This page documents the data formats consumed and produced by **PETGEM**: the
text and HDF5 inputs read by ``utils/preprocess.py``, the unified input bundle
it writes, and the responses file written by the kernel. All HDF5 files use the
PETSc/`h5py `_ layout; the Python helpers
``petgem.readBundle`` and ``petgem.readResponses`` provide convenient readers.
Conductivity table (``sigmas.txt``)
-----------------------------------
A whitespace-delimited text table of per-material conductivity, passed via
``-sigma_file`` (relative to ``-case_dir``). The row index is the **0-based
material id**. For a Gmsh mesh this maps to the physical group as
``material_id = gmsh:physical - 1``; for a VTK mesh the distinct cell-data
region codes are mapped to rows in ascending order (row ``i`` = ``i``-th
smallest code). See :doc:`meshing` for the per-format mapping. ``#`` comments
and blank lines are allowed (commas are also accepted, so legacy
comma-separated tables still load).
.. code-block::
# sigma_x sigma_y sigma_z [fixed]
0.1 0.1 0.1 1 # material 0 - held fixed during inversion (e.g. air)
1.0 1.0 1.0 0 # material 1 - invertable
2.0 2.0 2.0 # material 2 - 'fixed' column omitted -> defaults to 0
- ``sigma_x sigma_y sigma_z``: per-axis conductivity (S/m). Use equal values
for an isotropic material.
- ``fixed`` (optional 4th column): ``1`` marks the material as held fixed
during inversion (its gradient is zeroed). Forward modeling ignores it.
Sources
-------
Both modes use a single transmitter format: **one row per transmitter, 8
fields** (frequency on every row). Forward modeling (``-source_filename``)
repeats one frequency across its transmitters; inverse modeling
(``-inv_source_filename``) carries one row per ``(frequency, dipole)`` pair:
.. code-block::
freq x_pos y_pos z_pos current length dip_angle azimuth_angle
...
The fields are: frequency (Hz), position ``x y z`` (m), ``current`` (A),
``length`` (m), ``dip_angle`` and ``azimuth_angle`` (degrees). The legacy
forward layout (a lone ``freq`` line followed by 7-field rows) is still
accepted and converted to this form on read.
Receivers (``-receiver_filename``)
----------------------------------
A text file of receiver positions, one Cartesian point per row. Columns may be
separated by whitespace **or** commas (``#`` comments and blank lines allowed),
so receiver lists exported from MATLAB or other tools load without conversion:
.. code-block::
x y z
...
Observed data (``-observed_filename``, inverse only)
----------------------------------------------------
The observed electric field used as the inversion target, in **either** of two
forms (``preprocess.py`` detects the format by extension):
**HDF5** (``.h5`` / ``.hdf5``):
- ``/Ex``: complex ``[N_freq, N_recv]`` dataset (``complex128``, stored as an
``{r, i}`` compound). Row order matches the inverse source frequencies;
column order matches the receiver list.
- ``/frequencies``: the frequency of each row (Hz).
- optional root attribute ``error_level`` (the relative noise level).
**Raw text** (``invEx.dat``, MATLAB-style) - parsed inline by ``preprocess.py``,
so no separate conversion step is needed:
.. code-block::
freq_label Re(Ex_1) Im(Ex_1) Re(Ex_2) Im(Ex_2) ... Re(Ex_N) Im(Ex_N)
one row per frequency. The leading label is dropped (frequencies come from the
inverse sources); the noise level is supplied with ``-error_level`` (or left to
the kernel default). The standalone ``utils/convert_invex_to_hdf5.py`` can still
produce a reusable HDF5 from such a file, but is no longer required.
Input bundle (``-input_filename``, default ``input.h5``)
--------------------------------------------------------
``utils/preprocess.py`` assembles a **single unified HDF5 bundle** that the
kernel reads in full. Both modes share the same skeleton - mesh, model, order,
receivers, and ``/sources`` - and inverse runs simply **add** the two groups
that have no forward counterpart (``/observed`` and ``/inv_meta``). Each mode
populates only the fields it needs.
.. list-table::
:header-rows: 1
:widths: 30 12 58
* - Group / dataset
- Mode
- Contents
* - DMPlex topology + ``model_data``
- both
- Mesh (PETSc DMPlex layout) and per-cell conductivity, written by the preprocess
* - ``/nord``
- both
- Polynomial order (length-1 vector)
* - ``/receivers``
- both
- Receiver positions, flattened ``[N_recv * 3]``
* - ``/sources/*``
- both
- Transmitters, one entry per row: freq, position, current, length, dipAngle, azimuthAngle (per entry). Forward repeats one frequency; inverse carries one row per (frequency, dipole).
* - ``/observed/Ex``
- inverse
- Observed field; the ``@error_level`` attribute holds the relative noise level
* - ``/inv_meta/fixed_materials``
- inverse
- 0-based material ids held fixed (int32)
``petgem.readBundle(path)`` returns a dict with ``receivers`` (``[N_recv, 3]``),
``nord``, ``frequency`` (first transmitter), and ``sources`` (``[N_src, 8]``:
``freq x y z current length dipAngle azimuthAngle``). The DMPlex mesh and
``model_data`` are kept in the file but not returned.
Responses file
--------------
The forward kernel writes a single unified HDF5 file containing every
transmitter, named ``.h5`` (e.g. ``responses_p1.h5``). All
six field components are produced through PETSc's native HDF5 viewer; the
output Vecs are parallel on the kernel communicator, so writes are
collective MPI-IO when PETSc is linked against a parallel HDF5 build (no
rank-0 gather).
Layout::
/ root attrs (provenance + run-wide values)
/sources/src1/ per-source attrs
/sources/src1/fields/Ex complex PETSc Vec, length N_recv
/sources/src1/fields/Ey
/sources/src1/fields/Ez
/sources/src1/fields/Hx
/sources/src1/fields/Hy
/sources/src1/fields/Hz
/sources/src2/ (one such group per transmitter)
...
Root attributes:
- ``petgem_version``, ``input_filename``, ``date``, ``nord``, ``mpi_tasks``
(provenance).
- ``num_sources`` — number of transmitter groups stored under ``/sources/``.
- ``frequency`` — operating frequency (Hz); single-frequency forward runs
carry one shared value here in addition to the per-source attribute.
Per-source group attributes (``/sources/src{k}``): ``frequency``, ``x_pos``,
``y_pos``, ``z_pos``, ``current``, ``length``, ``dip_angle``,
``azimuth_angle``.
Python readers:
- ``petgem.readResponses(path, source=1)`` returns a per-source dict
(``Ex..Hz`` arrays + ``source`` and ``provenance`` attribute dicts),
preserving the shape used by pre-refactor callers.
- ``petgem.readAllResponses(path)`` returns ``{'provenance': ...,
'num_sources': N, 'sources': {1: {...}, 2: {...}, ...}}`` with each
per-source entry shaped like ``readResponses``.
The per-case ``postprocess.py`` scripts use these readers; see
``tests/cases/csem_model/postprocess.py`` (single-source reference compare)
and ``cicero_models/model_1/postprocess.py`` (multi-source plotting).