============ Data formats ============ This page documents the data formats consumed and produced by **PETGEM**: the text and HDF5 inputs read by ``utils/preprocess.py``, the unified input bundle it writes, and the responses file written by the kernel. All HDF5 files use the PETSc/`h5py `_ layout; the Python helpers ``petgem.readBundle`` and ``petgem.readResponses`` provide convenient readers. Conductivity table (``sigmas.txt``) ----------------------------------- A whitespace-delimited text table of per-material conductivity, passed via ``-sigma_file`` (relative to ``-case_dir``). The row index is the **0-based material id**. For a Gmsh mesh this maps to the physical group as ``material_id = gmsh:physical - 1``; for a VTK mesh the distinct cell-data region codes are mapped to rows in ascending order (row ``i`` = ``i``-th smallest code). See :doc:`meshing` for the per-format mapping. ``#`` comments and blank lines are allowed (commas are also accepted, so legacy comma-separated tables still load). .. code-block:: # sigma_x sigma_y sigma_z [fixed] 0.1 0.1 0.1 1 # material 0 - held fixed during inversion (e.g. air) 1.0 1.0 1.0 0 # material 1 - invertable 2.0 2.0 2.0 # material 2 - 'fixed' column omitted -> defaults to 0 - ``sigma_x sigma_y sigma_z``: per-axis conductivity (S/m). Use equal values for an isotropic material. - ``fixed`` (optional 4th column): ``1`` marks the material as held fixed during inversion (its gradient is zeroed). Forward modeling ignores it. Sources ------- Both modes use a single transmitter format: **one row per transmitter, 8 fields** (frequency on every row). Forward modeling (``-source_filename``) repeats one frequency across its transmitters; inverse modeling (``-inv_source_filename``) carries one row per ``(frequency, dipole)`` pair: .. code-block:: freq x_pos y_pos z_pos current length dip_angle azimuth_angle ... The fields are: frequency (Hz), position ``x y z`` (m), ``current`` (A), ``length`` (m), ``dip_angle`` and ``azimuth_angle`` (degrees). The legacy forward layout (a lone ``freq`` line followed by 7-field rows) is still accepted and converted to this form on read. Receivers (``-receiver_filename``) ---------------------------------- A text file of receiver positions, one Cartesian point per row. Columns may be separated by whitespace **or** commas (``#`` comments and blank lines allowed), so receiver lists exported from MATLAB or other tools load without conversion: .. code-block:: x y z ... Observed data (``-observed_filename``, inverse only) ---------------------------------------------------- The observed electric field used as the inversion target, in **either** of two forms (``preprocess.py`` detects the format by extension): **HDF5** (``.h5`` / ``.hdf5``): - ``/Ex``: complex ``[N_freq, N_recv]`` dataset (``complex128``, stored as an ``{r, i}`` compound). Row order matches the inverse source frequencies; column order matches the receiver list. - ``/frequencies``: the frequency of each row (Hz). - optional root attribute ``error_level`` (the relative noise level). **Raw text** (``invEx.dat``, MATLAB-style) - parsed inline by ``preprocess.py``, so no separate conversion step is needed: .. code-block:: freq_label Re(Ex_1) Im(Ex_1) Re(Ex_2) Im(Ex_2) ... Re(Ex_N) Im(Ex_N) one row per frequency. The leading label is dropped (frequencies come from the inverse sources); the noise level is supplied with ``-error_level`` (or left to the kernel default). The standalone ``utils/convert_invex_to_hdf5.py`` can still produce a reusable HDF5 from such a file, but is no longer required. Input bundle (``-input_filename``, default ``input.h5``) -------------------------------------------------------- ``utils/preprocess.py`` assembles a **single unified HDF5 bundle** that the kernel reads in full. Both modes share the same skeleton - mesh, model, order, receivers, and ``/sources`` - and inverse runs simply **add** the two groups that have no forward counterpart (``/observed`` and ``/inv_meta``). Each mode populates only the fields it needs. .. list-table:: :header-rows: 1 :widths: 30 12 58 * - Group / dataset - Mode - Contents * - DMPlex topology + ``model_data`` - both - Mesh (PETSc DMPlex layout) and per-cell conductivity, written by the preprocess * - ``/nord`` - both - Polynomial order (length-1 vector) * - ``/receivers`` - both - Receiver positions, flattened ``[N_recv * 3]`` * - ``/sources/*`` - both - Transmitters, one entry per row: freq, position, current, length, dipAngle, azimuthAngle (per entry). Forward repeats one frequency; inverse carries one row per (frequency, dipole). * - ``/observed/Ex`` - inverse - Observed field; the ``@error_level`` attribute holds the relative noise level * - ``/inv_meta/fixed_materials`` - inverse - 0-based material ids held fixed (int32) ``petgem.readBundle(path)`` returns a dict with ``receivers`` (``[N_recv, 3]``), ``nord``, ``frequency`` (first transmitter), and ``sources`` (``[N_src, 8]``: ``freq x y z current length dipAngle azimuthAngle``). The DMPlex mesh and ``model_data`` are kept in the file but not returned. Responses file -------------- The forward kernel writes a single unified HDF5 file containing every transmitter, named ``.h5`` (e.g. ``responses_p1.h5``). All six field components are produced through PETSc's native HDF5 viewer; the output Vecs are parallel on the kernel communicator, so writes are collective MPI-IO when PETSc is linked against a parallel HDF5 build (no rank-0 gather). Layout:: / root attrs (provenance + run-wide values) /sources/src1/ per-source attrs /sources/src1/fields/Ex complex PETSc Vec, length N_recv /sources/src1/fields/Ey /sources/src1/fields/Ez /sources/src1/fields/Hx /sources/src1/fields/Hy /sources/src1/fields/Hz /sources/src2/ (one such group per transmitter) ... Root attributes: - ``petgem_version``, ``input_filename``, ``date``, ``nord``, ``mpi_tasks`` (provenance). - ``num_sources`` — number of transmitter groups stored under ``/sources/``. - ``frequency`` — operating frequency (Hz); single-frequency forward runs carry one shared value here in addition to the per-source attribute. Per-source group attributes (``/sources/src{k}``): ``frequency``, ``x_pos``, ``y_pos``, ``z_pos``, ``current``, ``length``, ``dip_angle``, ``azimuth_angle``. Python readers: - ``petgem.readResponses(path, source=1)`` returns a per-source dict (``Ex..Hz`` arrays + ``source`` and ``provenance`` attribute dicts), preserving the shape used by pre-refactor callers. - ``petgem.readAllResponses(path)`` returns ``{'provenance': ..., 'num_sources': N, 'sources': {1: {...}, 2: {...}, ...}}`` with each per-source entry shaped like ``readResponses``. The per-case ``postprocess.py`` scripts use these readers; see ``tests/cases/csem_model/postprocess.py`` (single-source reference compare) and ``cicero_models/model_1/postprocess.py`` (multi-source plotting).