ProcMT

Preface

ProcMT (process MT) is a collection of libraries and executables. They are bundled together in several GUIs (graphical user interfaces).

ProcMT works on a very specific directory structure where the software reads XML data and ats data (advanced time series) as well as calibration files.

The philosophy is to configure all first and then run the program. If configured you can run many jobs - the computer does not wait wait on mouse clicks.

All actions and joblists are stored as XML files and will be executed on request from the GUI or shell.

The LINUX user will be able to run procmt_main from script files, in case you have to process hundreds of sites. The script interface can also run on the ADU systems.

Activating all parallel options may lead to memory exhaustion on 32bit systems. It is recommended to use 64bit systems with more than 8GB memory. E.g. you expect 50% of the timeseries data size to be used as memory.

Setup

Directory Structure

ProcMT expects a directory structure to work on. This structure is either created by a script or by the GUI by calling File -> Create Project.
Your default file manger will pop up and you create a new directory:

create directory
Create Directory

(The following details you may skip first; your data comes to the ts directory and the processing instructions reside in the processing directory).
directory tree
directory tree (part)
none of these directory may be deleted by the user. You can create additional directories if you want (like doc procs_old and so on).

cal: contains the metronix standard calibration files. In case "cal auto" is used the software first looks for the calibration file, secondly for the XML embedded calibration and lastly - if both not found - will use an internal calibration function.

config: contains configuration files (e.g.XML files). These are normally files for channel mapping and EMAP. Additionally you can put your (plain ASCII) frequency list files here.
"freq_list_1.dat" to "freq_list_7.dat" are predefined names which you can use and can be accessed from the GUI.

db: contains SQLite databases is case. (May not used by users, by default you should switch the database off and use the EDI file output in the edi directory. With a MySQL connection different users can write results to a centralized database and share their results) After creating a new survey the database must be created by hand from the menu Results → Database → Ok : Creates a new local database. By default the DB is disabled in procmt_cmds

dump: contains dump files. HENCE most dump files will be overwritten without notice. Copy in case you need them. You should turn dumps of (except your are facing problems) in the command line options module

edi: contains the EDI output. Files will not be over written but numbered

filters: contains filter data case

jle: contains jobfiles for this survey for usage with job list editor (jle)(if you want to use them)

jobs: contains job files for usage (XML files). A job a a combination of processing(s) and timeseries data.
You can call a job file from the command line: procmt_main jobs/myjobfile.xml

log: contains log data case

processings: contains all jobs which can be submitted and used for this survey. All files are XML files. If you truly understand these files you also can edit them by hand.

shell: contains BASH scripts for LINUX users

tmp: contains temporary data in case; may be cleared automatically.

ts: contains the time series data. See add data.

Add Data

The data is in the time series folder "ts". You simply create "Site_1", "Site_2" and so on ("Site_", not "site_"; or from the GUI File -> Create Site). Copy the COMPLETE measdir (example " meas_2014-08-15_07-23-38 ") into the Site_N. The directory and the timeseries will appear some seconds later in the GUI (there is an auto-scan in the background running). If the recording was setup correctly in the joblist or ADU interface, you can double-click the Site and the processing runs.
For EMAP it is possible to create site names like Site_140_142_144. These types of directories can only be processed in case you have created a corresponding XML configuration inside the config directory and told the processing to use that configuration file.
Site and site numbers have to be separated by underscore (  _  ).

Processings

Pre-Processings

Pre-processings are modules used to read and condition the timeseries and create the auto- and cross-spectra which will be later used by the MT processing modules. All parameters saved in XML format.

edi_info

This module takes values for creating the EDI file later

procmt_cmds

This modules contains options which could be also used as command line parameter. They will affect or override parameter from several modules. For example here you can quick select a RR (remote reference) site instead editing somewhere else, also you can decide to process the Z tensor in a rotated coordinate system.

Time Series Processing Modules

ats file reader

scale detrend hanning

Scales the electric field to mV/km and applies a detrend functions as well as a Hanning window.

FFT

The FFT module transforms the time series data into the spectral domain. The upper and lower parts of the spectra can be cut off.
The idea is: if you have used cascaded bands with 4x filtering, the FFT of one band will overlap the other. This can be controlled with the overlapping.
Additionally the spectral density varies and it may be useful to cut the lower end by 3-5% or more.

calibrate spectra

The calibration module applies the transfer functions of the used sensors. The influence of the sensor will be eliminated, in the magnetic spectra the units will change from $ mV \over {\sqrt{HZ}} $ to $ nT \over {\sqrt{HZ}} $

auto-cross-parzening

MT Processing Modules

After finishing auto- and cross spectra the MT modules are activated. You can select one or more modules at the same time. Depending on the memory and CPU of your computer you can choose the cat all option in the command line module. This will lead to a vast amount of memory usage and parallel execution of the MT modules The cat all gives however the best results.

Standard Parameter

Some processing parameters appear in almost any MT module:

Stack all

This is the most simple module. All tensors are simply stacked and the mean is used as estimator. Mostly this method is used as reference in order to compare "what would happen" if I simply take all.

The method quite well indicates good frequency ranges. In the dead band or at very high frequencies stack all can give surprising good results. This happens when the data is scattered strongly but still balanced out - but the statistical estimator (mean, median, coherency) cuts of the "wrong" outliers" . 
Hence that when you have activated the coherency rejection in the command line module the stack all is not a pure stack all method anymore (as well when the quadrant check is activated in the standard parameter).
This module does not contain other parameter as the standard parameter.

The mean is calculated as \begin{equation} \bar x = \frac{1} { N } \sum_{i=0}^{N-1} x_i \label{eq:mean} \end{equation} and the standard deviation as: \begin{equation} \sigma = \sqrt{\frac{1}{N-1}\left(\bar{x}-{x}_{i} \right)} = \sqrt{\frac{1}{N-1}\left(\sum_{i=0}^{N-1}{x}^{2}_{i} - \left( \sum_{i=0}^{N-1}{x}_{i} \right)^2 /N \right)} \label{eq:stddev} \end{equation} where the second term is faster and has less calculation noise. Hence that the 3σ interval defines an outlier in a Gauss distribution. However this interval is much to big for MT. Better take 1.3, 1.7 or similar.

Coherency Threshold

This classic module evaluates the coherency between Ex & Hy and Ey & Hx. If you have enough data you can use thresholds of 0.8 or more. If the selection is to strong it will lead to an empty EDI file.


Median Processing

The median processing calculates the median of the the zxy  and zyx component of your tensor array: sort(z), take the N/2 element. However: if there are not enough stacks, the median is as well useless as the mean. Also here it is recommended to have more than 20 stacks available for processing code The median is a robust estimator against outliers.

Hence: you will not get the median itself if upper & lower = 1 (it will likely crash). Reason: the processing evaluates zxy and zyx and they do not have the same median, so the EDI will be empty.

Take 1.2 to 1.3 for the resistivities and and 1.3 to 1.5 for the phases (phases are more scattered).
Hence: a to strong selection leads to empty EDI files.
If the lower bound is set to 1.2 and the upper bound set to 1.4 the resulting zxy and zyx values will be biased downwards (because you give preference to lower data). This can be desired if you believe that the data itself is biased downwards and you want to correct that.

ProcMT shell

procmt_createjob ts/Site_132_134_136_138/*/*xml > jobs/132.xml
procmt_createjob ts/Site_132_134_136_138/*/*xml > jobs/132.xml
procmt_createjob -useproc processings/mt_auto_rr2.xml processings/mt_auto.xml ts/Site_2/*/*xml > jobs/123.xml
sh shell/mkallproc.sh -useproc processings/mt_auto_rr2.xml processings/mt_auto.xml

FAQ

Lowest / highest result in EDI: The lowest or highest frequency depends on sampling rate and FFT window length. By default the upper limit is fsample  /  4. In case of fsample  =  256  Hz the highest frequency would be 64  Hz - except you have changed the additional cut-off in the FFT module. The FFT window length will give the theoretical lower limits. In case of 1024 points the lower limint would be 1 s (also in case the cut-off in the FFT module was not set).

Now you have used a parzening. Parzening extends the frequencies to the left and right and adds frequencies for smoothing. In case of 0.3 the lowest frequency becomes 0.4 s and also the highest frequency drops down.
So: In order to get lower frequencies or periods you filter the timeseries again. For the highest there is no solution (except using a smaller parzening).

Missing frequencies in EDI: Means simply that the processing was not able to meet your conditions in terms of coherency threshold or median and/or additional criteria like quadrant check, phase/resistivity limit or standard deviation. Normally you re-process the data with different settings or you find out that the data quality is not acceptable at all. Another possibility (if you must have a constant frequency data set) is to activate the the replacement in Coherency Threshold or Median Processing

Appendix

measdoc : on or more file(s) inside a measdir. Example: 319_2014-08-15_07-23-37_2014-08-15_20-20-00_R000_256H.xml indicates a recording with system 319 with start time, stop time, run number 000 and 256Hz sampling frequency. The measdoc contains additional information for one ore more ats files like calibration data, processing instructions and system status during the recording.

measdir : contains one ore more measdoc and ats files, like meas_2014-08-15_07-23-37 . In general a measdir a a complete set of data. If you want copy data it is always the best and save to copy a complete directory.

ats : advanced time series : binary data format with header (1024 bytes) followed by 32bit integer (Intel format) - Version 80. Version 1080 is the sliced version which contains up to 1024 sliced recordings in the same file (e.g you record 12 time a day 1024  kHz and put this in one ats file). The sliced version is limited in processing and should not be used.

EDI :Electrical Data Interchange. Exchange format used in MT

processing : consists of one or more modules bundled together. These instructions are saved inside one file which is used by ProcMT to initiate the MT processing. A job can contain many MT processings.

job :consists of a processing and a list of files to process.

FFT / DFT : The DFT (discrete Fourier transform) is the mathematical realization of the FFT (fast Fourier transform) inside the computer. The window length together with the sampling rate decides over highest and lowest frequency in the spectrum. As longer (more points from the timeseries) the window becomes as more lower frequencies the spectrum contains.
Increasing the window from 1024 to 8192 @ 512  Hz sampling rate is NOT the same as using a window of 1024 64  Hz. Reason: the timeseries may contain errors, and the FIR filter will produce a different output compared to the longer FFT. If the data is of good quality the result will be almost the same (ref.  filter compare)
Additionally you want to save computation time when re-processing the data with other settings.

bandwidth : Unit [Hz]. Depends on the (sampling frequency / FFT length) and the Parzen window. You achive a smaller bandwidth if you increase the FFT length. The bandwidth decreases with bigger Parzen window. In theory a smaller bandwidth increases the depth resolution and decreases the resistivity resolution. Since the MT transfer functions are smooth, you should not over estimate the bandwidth.

UTC, GPS, TAI :

The difference between UTC and the atomic clocks becomes greater and greater because the earth rotatopn slows down.