# ProcMT

## Preface

ProcMT (process MT) is a collection of libraries and executables. They are bundled together in several GUIs (graphical user interfaces).

ProcMT works on a very specific directory structure where the software reads XML data and ats data (advanced time series) as well as calibration files.

The philosophy is to configure all first and then run the program. If configured you can run many jobs - the computer does not wait wait on mouse clicks.

All actions and joblists are stored as XML files and will be executed on request from the GUI or shell.

The LINUX user will be able to run procmt_main from script files, in case you have to process hundreds of sites. The script interface can also run on the ADU systems.

Activating all parallel options may lead to memory exhaustion on 32bit systems. It is recommended to use 64bit systems with more than 8GB memory. E.g. you expect 50% of the timeseries data size to be used as memory.

# Setup

## Directory Structure

ProcMT expects a directory structure to work on. This structure is either created by a script or by the GUI by calling File -> Create Project.
Your default file manger will pop up and you create a new directory:

(The following details you may skip first; your data comes to the ts directory and the processing instructions reside in the processing directory).
none of these directory may be deleted by the user. You can create additional directories if you want (like doc procs_old and so on).

cal: contains the metronix standard calibration files. In case "cal auto" is used the software first looks for the calibration file, secondly for the XML embedded calibration and lastly - if both not found - will use an internal calibration function.

config: contains configuration files (e.g.XML files). These are normally files for channel mapping and EMAP. Additionally you can put your (plain ASCII) frequency list files here.
"freq_list_1.dat" to "freq_list_7.dat" are predefined names which you can use and can be accessed from the GUI.

db: contains SQLite databases is case. (May not used by users, by default you should switch the database off and use the EDI file output in the edi directory. With a MySQL connection different users can write results to a centralized database and share their results) After creating a new survey the database must be created by hand from the menu  Results → Database → Ok  : Creates a new local database. By default the DB is disabled in procmt_cmds

dump: contains dump files. HENCE most dump files will be overwritten without notice. Copy in case you need them. You should turn dumps of (except your are facing problems) in the command line options module

edi: contains the EDI output. Files will not be over written but numbered

filters: contains filter data case

jle: contains jobfiles for this survey for usage with job list editor (jle)(if you want to use them)

jobs: contains job files for usage (XML files). A job a a combination of processing(s) and timeseries data.
You can call a job file from the command line: procmt_main jobs/myjobfile.xml

log: contains log data case

processings: contains all jobs which can be submitted and used for this survey. All files are XML files. If you truly understand these files you also can edit them by hand.

shell: contains BASH scripts for LINUX users

tmp: contains temporary data in case; may be cleared automatically.

ts: contains the time series data. See add data.

The data is in the time series folder "ts". You simply create "Site_1", "Site_2" and so on ("Site_", not "site_"; or from the GUI File -> Create Site). Copy the COMPLETE measdir (example "  meas_2014-08-15_07-23-38  ") into the Site_N. The directory and the timeseries will appear some seconds later in the GUI (there is an auto-scan in the background running). If the recording was setup correctly in the joblist or ADU interface, you can double-click the Site and the processing runs.
For EMAP it is possible to create site names like Site_140_142_144. These types of directories can only be processed in case you have created a corresponding XML configuration inside the config directory and told the processing to use that configuration file.
Site and site numbers have to be separated by underscore (  _  ).

# Processings

## Pre-Processings

Pre-processings are modules used to read and condition the timeseries and create the auto- and cross-spectra which will be later used by the MT processing modules. All parameters saved in XML format.

### edi_info

This module takes values for creating the EDI file later

• DATAID: like  NothernMining  or  Section2 
• ACQBY: e.g. the person who has recorded the data in the field
• FILEBY: e.g. the one who has processed the data
• ACQDATE: put  auto  and the software will take the first timeseries entry
• ENDDATE: put  auto  and the software will take the last timeseries entry
• FILEDATE: out  auto  and the creation date appears here
• REFLOC: redording location like  Ketzin near Berlin 
• STATE: redording location like  Brandenburg  or  California 
• COUNTRY: redording location like  Germany  or  U.S.A 

### procmt_cmds

This modules contains options which could be also used as command line parameter. They will affect or override parameter from several modules. For example here you can quick select a RR (remote reference) site instead editing somewhere else, also you can decide to process the Z tensor in a rotated coordinate system.

• queue length - handle with care : internal buffer size. Take  auto  ! Tuning does not effect much the speed. Take 16384 for example if you have 16GB and more RAM.
• AI Limit for Swift (greater / equal 0, negative off) : if -1 tensors are not rotated. If 1.6 for example data will be rotated according to SWIFT if the anisotropy is greater than 1.6, tensors are processed and finally rotated back to 0 when writing EDI file. Giving a reasonable anisotropy avoids useless rotation (if the anisotropy is close to 0 a rotation angle becomes undefined).
• Tensor rotation N-> E : rotate thhttp://geo-metronix.de/mtxgeo/index.php/efp-06e Z tensor before analysis; will be rotated back to 0 when writing EDI file (+/- 360 deg); 0 = off
• Tensor rotation AI lower limit : for the above: rotate only when AI is greater; say for example 1.5 (range 0 - 10000) - only active if the above != 0.
• dump various : off, on, binspectra : on will create various dump files, binspectra will create a dump file for all auto- and cross spectra. This file can be used together with the spectral editor. binspectra is simply spoken a freeze of the processing after auto- and cross spectra have been calculated.
• dump raw transfer functions : off, on : just for checking the finally used (not activated yet)
• de-activate database : false, true : if true ProcMT will write EDI data additionally to a SQL database (SQLite, MySQL)
• split all : false, true : should be off. If on ProcMT tries as many EDI files as possible
• cat all : true, false : should be on. ProcMT will join all auto- and cross spectra from different bands (ats files) in memory. You get spectral data for example from the 512Hz and from the 128Hz recording. In this case the 16Hz data may be a combination of spectra of 512Hz and 128Hz recording.
Even though memory consuming this gives best results.
• ascii table edi : false, true : write and ASCII table (only for users who want quick access like with Gnuplot or XMGrace, you should ALWAYS use the EDI file for whatever you do.)
• write edi spectra : true, false : should be always true. Otherwise the  > =SPECTRASECT  in the EDI file is missing.
• write edi Z : true, false : should be always true. Writes the  > =MTSECT  in the EDI file.
• join same processing comments : true, false : When using different processings like hf.xml, lf.xml, deadband.xml you may want a single EDI as result when processing several bands. In this case use the same processing comment in all processings (like mydefault in hf.xml, lf.xml, deadband.xml ) and ProcMT joins the spectra (if cat all is active) and joins the results to one EDI file.
HENCE: different processing must submitted during recording with the ADU. Otherwise you may have to edit the measdoc and replace the processing (e.g. mt_auto) with yours (hf):  <processings> mt_auto</processings>  edit to  <processings> hf</processings>  .
For CSAMT this is the default setting because here you have for each recording a different processing (which includes the transmission time and transmission frequency).
• activate online processing : N/A yet : gets data in real time from a server (ADU)
• server ip : IP address of server for real time processing
• port : port of server
• vlf processing activated : N/A yet
• auto bandwidth : using a adaptive bandwidth according to the sampling frequency. The fft length of sampling rates less than 1  Hz is 128  points, for 128  Hz 512  points and so on. This allows you to create one processing for all data - e.g. we call that mt_auto. You can adjust auto bandwidth by 1/8, 1/4, 1/2 : in this case the fft will be longer (and the bandwidth smaller) and with 2, 4, 8 the fft will be shorter (and the bandwidth greater). If the option is active the fft length in the ats file reader will be ignored
example table for the auto-bandwidth
• auto parzen : override the parzen radius from the auto-cross-parzening module. The parzen radius will be increased with increasing periods.
Theoretically you want small parzening together with small bandwidth (greater FFT) when you process data like 46  Hz and want exclude influence from 50  Hz.
• reject coherency less than in advance : You can reject less correlated data in advance. The stack all method becomes than a coherency threshold.
Especially if you have huge data sets you can exclude data in advance so that this data does not effect the memory usage when the processing comes to the MT modules
• remote Site No : set the number (just the number) of the remote site. This option is also available from the GUI. Set here if you use a batch processing or the remote site number is always the same.
• force remote Site RUN : by default ProcMT selects the greatest overlapping time segment of the remote site. You can for ProcMT to take a different time segment
• force remote dir :
• only remote cross spectra :
• show edi after processing : [on/off] when processing is finished start the edi plotter. Turn off for batch processing.

# Time Series Processing Modules

• skip samples at the beginning of the file:use only when the first data points of your file are useless.
• skip samples at the end of the file:use only when the last data points of your file are useless.
• use nth slice for processing: only for "CEA", don't use it. In case of sliced time series use the nth slice for processing;  min=0, max=1023 
• use n slices for processing: only for "CEA", don't use it. Process more than one slice;  min=0, max=1024  starting at nth slice: using 1024 means start at nth slice and use all to the end (ref. above)
a setting of  0, 0  treats the file like a standard file and reads from the default header - and the default would contain the total amount of samples and the CEA sliced file would be rad as a metronix standard file(in case the file contains more than one slice this will lead to data corruption); a setting of  0, 1024  does almost the same but reads the samples and settings from first slice (0)
• use block of samples:u
• move back after read: after each reading move n samples back; use this if you using a filter which reads 1200 points and makes 1024 out of it; in this case you want to move back 176 points. Do not use as overlapping (see below).
• Window Length FFT: If the bandwidth is not set to auto (see procmt cmds) this reading length is also the FFT length. Values:  64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288  .
• skip_marked_parts: default  on  . If you have made selections (e.g. with the tsplotter) than these selected parts are excluded from processing. If no corresponding selection files (*.atm) are there, all will be selected for processing.
• overlapping blocks: Due to the hanning window not all data is processed with the same weightening. An overlapping of 0.3 or 0.4 will correct that and enhance the statistics.
• pre-stack in time domain: stacks time series with the given window length above. Does only work with CSAMT
• dipole TX time symmetric: N/A
• skip TX switching: N/A

### scale detrend hanning

Scales the electric field to mV/km and applies a detrend functions as well as a Hanning window.

• scale E to mV/km: default  on 
• remove trend: default  on 
• apply hanning windows: default  on 

### FFT

The FFT module transforms the time series data into the spectral domain. The upper and lower parts of the spectra can be cut off.
The idea is: if you have used cascaded bands with 4x filtering, the FFT of one band will overlap the other. This can be controlled with the overlapping.
Additionally the spectral density varies and it may be useful to cut the lower end by 3-5% or more.

• cut upper: cuts the upper part of the frequencies; use  0.05  to cut the upper 5%
• cut lower: cuts the lower part of the spectra; use  10%  to cut the lower 10%.
• dump raw spectra: for debugging purpose only (this parameter should not be set by the user)

### calibrate spectra

The calibration module applies the transfer functions of the used sensors. The influence of the sensor will be eliminated, in the magnetic spectra the units will change from $mV \over {\sqrt{HZ}}$ to $nT \over {\sqrt{HZ}}$

• calibration:  auto  : uses the classic calibration file  *.txt  from the from the cal directory, if not found the XML from the measdoc then and finally the built in (default). This way you can anytime override the XML in case.
 Builtin  : uses a master calibration function which fits excellent for sampling rates up to 4  kHz and good above.
 Theoretical  : uses a theoretical function.
The filename is a) SensorType + Serialnumber + .txt like MFS06e1568.txt; b) in case the serial number is short, the file name is filled with '0' until the name length is 9: MFS06e005.txt and c) for the very old MFS (without 'e') to 8 characters MFS06005.txt.
The coil type is taken from the ats header.
• dump calibration: dumps the finally used calibration data ( this parameter should only be used for testing)
• dump cal spectra: for debugging purpose only (this parameter should not be set by the user)

### auto-cross-parzening

• xml MT site config file (empty for default): You only need this in case of EMAP
• cut upper spectra: this seems repeated from the FFT module; use only if other than FFT module is used
• cut lower spectra: this seems repeated from the FFT module; use only if other than FFT module is used
• parzen radius 0.05 to 0.8; combines frequencies in the spectral domain. higher values (0.25, 0.5) should create more smooth data. Lower values can be used for more stable MT data as well to get points near interfering lines (e.g. close to 50/60  Hz)
• activate MT/CSAMT: switch bewteen MT and CSAMT,MT scalar and CSAMT scalar. Hence that scalar data is mostly useless.
• Frequency List: is a drop down menu.  default  is the metronix frequency list which we are using since 20 years. The list contains 8 frequency per decade.  txmsmall  and  fluxgate  are different lists. The "freq_list_1.dat" to "freq_list_7.dat" are to be created by the user manually in the config directory.
The list simply contains numbers  4.000000000000e+01 3.000000000000e+01 2.200000000000e+01  in descending order.
Same applies for the  CEA_TX, CSAMT_TX, VLF  lists: You create them as ascii files.
 generate  is for CSAMT only. The frequencies will be calculated from the  base TX frequency. 
• Frequency List File (empty for default): In this case the drop down selected is active. If you provide file name like  mylist.dat  this file will be take. Do NOT provide a path. File has to be placed in the config directory.
• base TX frequency: frequency of transmission. Put 8 if your TX transmitt with 8  Hz and 0.25 if your TX transmitts with 4  s.
• nth odd harmonics: 1, 2, 3 : 1 uses the base frequency for processing only. Gives mostly the best result because higher odd harmonics may vanish.
2 means that the second odd harmonics is used (if TX base frequency is 70 Hz the 210 Hz is used additionally).
3: third odd harmonic (in our example 350 Hz). Hence that the amplitude of the upper harmonics are dropping down and the data quality may also
• dipole 1 TX time[s]: time in seconds of the first dipole. Take even times like 8, 16, 32, 64 and so on in order to fit most possible FFT windows
• dipole 2 TX time[s]: time in seconds of the second dipole; must be same as above
• dipole 3 TX time[s]: N/A
• dipole 4 TX time[s] N/A
• dipole 5 TX time[s] N/A
• dipole 6 TX time[s] N/A
• activate online processing N/A
• dump CSAMT spectra infos N/A
• dump parzen spectra for debugging (this parameter should not be set by the user).

# MT Processing Modules

After finishing auto- and cross spectra the MT modules are activated. You can select one or more modules at the same time. Depending on the memory and CPU of your computer you can choose the  cat all  option in the command line module. This will lead to a vast amount of memory usage and parallel execution of the MT modules The  cat all  gives however the best results.

## Standard Parameter

Some processing parameters appear in almost any MT module:

• start index: 0 ... N, first tensor to process; default  0 (this parameter should not be set by the user)
• stop index: 0 ... N, last tensor to process; default  0 ( first = last = 0 = take all) (this parameter should not be set by the user)
• start frequency: limit the frequency range, low [Hz] (this parameter should not be set by the user)
• stop frequency: limit the frequency range, high [Hz] (this parameter should not be set by the user)
• upper reject rho: reject $\rho$ if $\rho$xy and $\rho$yx exceed this limit
• lower reject rho: reject $\rho$ if $\rho$xy and $\rho$yx fall below this limit
• upper reject phi reject: $\phi$ if $\phi$xy and $\phi$yx exceed this limit
• lower reject phi: reject $\phi$ if $\phi$xy and $\phi$yx fall below this limit
Hence: the simple cut-off method works on the xy and yx elements only. Data is not rotated, fixed rotated or Swift rotated. It can be used to cut-off segments where distortions can be identified by $\rho$ or $\phi$. This can happen if cables where broken and have been re-connected later. A working example is here cut upper and lower
• use n stdev for rejection: n=0=no calculation: reject tensors outside n times the standard deviation. E.g. having a Gauss distribution 3 times would be definitively outside the confidence interval. If 2 times is selected for example zxy mean is 5  ± 0.75 and zyx 9  ± 1.2 all tensors with zxy greater than 3.5 and smaller 6.5 and zyx greater 7.8 and smaller 10.2 will be selected. if zxy is 5.2 and zyx is 12.1 this tensor will NOT be selected (so the criterion must be fulfilled for both elements simultaneously).
• recalculate stddev after reject: if above is selected take  on  here to recalculate the standard deviation from the selected tensors (again); MUST be  on  if rejection by standard deviation is used.
• check quadrant: checks that the zxy are all positive (real and imaginary part) and zyx are all negative (real and imaginary part). In most cases this assumption is  true  .

## Stack all

This is the most simple module. All tensors are simply stacked and the mean is used as estimator. Mostly this method is used as reference in order to compare "what would happen" if I simply take all.

The method quite well indicates good frequency ranges. In the dead band or at very high frequencies stack all can give surprising good results. This happens when the data is scattered strongly but still balanced out - but the statistical estimator (mean, median, coherency) cuts of the "wrong" outliers" .
Hence that when you have activated the coherency rejection in the command line module the stack all is not a pure stack all method anymore (as well when the quadrant check is activated in the standard parameter).
This module does not contain other parameter as the standard parameter.

The mean is calculated as $$\bar x = \frac{1} { N } \sum_{i=0}^{N-1} x_i \label{eq:mean}$$ and the standard deviation as: $$\sigma = \sqrt{\frac{1}{N-1}\left(\bar{x}-{x}_{i} \right)} = \sqrt{\frac{1}{N-1}\left(\sum_{i=0}^{N-1}{x}^{2}_{i} - \left( \sum_{i=0}^{N-1}{x}_{i} \right)^2 /N \right)} \label{eq:stddev}$$ where the second term is faster and has less calculation noise. Hence that the 3σ interval defines an outlier in a Gauss distribution. However this interval is much to big for MT. Better take 1.3, 1.7 or similar.

## Coherency Threshold

This classic module evaluates the coherency between Ex & Hy and Ey & Hx. If you have enough data you can use thresholds of 0.8 or more. If the selection is to strong it will lead to an empty EDI file.

• upper threshold : reject tensors where the coherency for zxy and zyx  is higher than. Should be 1. Can be used (if not 1) to suppress data where the correlation is high but may come from a different source other than MT (like correlated spikes in E and H).
• lower threshold : reject tensors where the coherency for zxy and zyx  is less than. Max=  0.99  .
• upper threshold % : N/A yet : reject x % of the "best data"
• lower threshold % : N/A yet :  reject x% of the "poor data"
• replace zero selection with stack all : In case stack all was activated and NO tensors could be calculated for a certain frequency (because of poor data quality) this frequency will be replaced with "stack all". Hence: this is no good solution because you are mixing data from different processings.

## Median Processing

The median processing calculates the median of the the zxy  and zyx component of your tensor array: sort(z), take the N/2 element. However: if there are not enough stacks, the median is as well useless as the mean. Also here it is recommended to have more than 20 stacks available for processing  code  The median is a robust estimator against outliers.

Hence: you will not get the median itself if upper & lower = 1 (it will likely crash). Reason: the processing evaluates zxy and zyx and they do not have the same median, so the EDI will be empty.

Take 1.2 to 1.3 for the resistivities and and 1.3 to 1.5 for the phases (phases are more scattered).
Hence: a to strong selection leads to empty EDI files.
If the lower bound is set to 1.2 and the upper bound set to 1.4 the resulting zxy and zyx values will be biased downwards (because you give preference to lower data). This can be desired if you believe that the data itself is biased downwards and you want to correct that.

• lower bound median rho : allow the lowest $\rho$xy and $\rho$yx to be n times lower than the median.
• upper bound median rho : allow the highest $\rho$xy and $\rho$yx to be n times higher than the median.
• lower bound median phi : allow the lowest $\phi$xy and $\phi$yx to be n times lower than the median.
• upper bound median phi : allow the highest $\phi$xy and $\phi$yx to be n times higher than the median.
• replace zero selection with stack all : In case stack all was activated and NO tensors could be calculated for a certain frequency (because of poor data quality) this frequency will be replaced with "stack all". Hence: this is no good solution because you are mixing data from different processings.

# ProcMT shell

 procmt_createjob ts/Site_132_134_136_138/*/*xml > jobs/132.xml   procmt_createjob ts/Site_132_134_136_138/*/*xml > jobs/132.xml   procmt_createjob -useproc processings/mt_auto_rr2.xml processings/mt_auto.xml ts/Site_2/*/*xml > jobs/123.xml   sh shell/mkallproc.sh -useproc processings/mt_auto_rr2.xml processings/mt_auto.xml 

# FAQ

Lowest / highest result in EDI: The lowest or highest frequency depends on sampling rate and FFT window length. By default the upper limit is fsample  /  4. In case of fsample  =  256  Hz the highest frequency would be 64  Hz - except you have changed the additional cut-off in the FFT module. The FFT window length will give the theoretical lower limits. In case of 1024 points the lower limint would be 1 s (also in case the cut-off in the FFT module was not set).

Now you have used a parzening. Parzening extends the frequencies to the left and right and adds frequencies for smoothing. In case of 0.3 the lowest frequency becomes 0.4 s and also the highest frequency drops down.
So: In order to get lower frequencies or periods you filter the timeseries again. For the highest there is no solution (except using a smaller parzening).

Missing frequencies in EDI: Means simply that the processing was not able to meet your conditions in terms of coherency threshold or median and/or additional criteria like quadrant check, phase/resistivity limit or standard deviation. Normally you re-process the data with different settings or you find out that the data quality is not acceptable at all. Another possibility (if you must have a constant frequency data set) is to activate the the replacement in Coherency Threshold or Median Processing

# Appendix

measdoc : on or more file(s) inside a measdir. Example:  319_2014-08-15_07-23-37_2014-08-15_20-20-00_R000_256H.xml  indicates a recording with system 319 with start time, stop time, run number 000 and 256Hz sampling frequency. The measdoc contains additional information for one ore more ats files like calibration data, processing instructions and system status during the recording.

measdir : contains one ore more measdoc and ats files, like  meas_2014-08-15_07-23-37  . In general a measdir a a complete set of data. If you want copy data it is always the best and save to copy a complete directory.

ats : advanced time series : binary data format with header (1024 bytes) followed by 32bit integer (Intel format) - Version 80. Version 1080 is the sliced version which contains up to 1024 sliced recordings in the same file (e.g you record 12 time a day 1024  kHz and put this in one ats file). The sliced version is limited in processing and should not be used.

EDI :Electrical Data Interchange. Exchange format used in MT

processing : consists of one or more modules bundled together. These instructions are saved inside one file which is used by ProcMT to initiate the MT processing. A job can contain many MT processings.

job :consists of a processing and a list of files to process.

FFT / DFT : The DFT (discrete Fourier transform) is the mathematical realization of the FFT (fast Fourier transform) inside the computer. The window length together with the sampling rate decides over highest and lowest frequency in the spectrum. As longer (more points from the timeseries) the window becomes as more lower frequencies the spectrum contains.
Increasing the window from 1024 to 8192 @ 512  Hz sampling rate is NOT the same as using a window of 1024 64  Hz. Reason: the timeseries may contain errors, and the FIR filter will produce a different output compared to the longer FFT. If the data is of good quality the result will be almost the same (ref.  filter compare)
Additionally you want to save computation time when re-processing the data with other settings.

bandwidth : Unit [Hz]. Depends on the (sampling frequency / FFT length) and the Parzen window. You achive a smaller bandwidth if you increase the FFT length. The bandwidth decreases with bigger Parzen window. In theory a smaller bandwidth increases the depth resolution and decreases the resistivity resolution. Since the MT transfer functions are smooth, you should not over estimate the bandwidth.

UTC, GPS, TAI :

• UTC: Coordinated Universal Time, time used by ADU system. Time is adjusted to earth rotatoion by "leap seconds"
• GPS: time of the atomic clocks of GPS satellites, UTC + 17s (July 2015), started 5. January 1980, 24:00:00
• TAI: Temps Atomique International, Atomic time, UTC + 17s + 19s (July 2015), started 1. January 1958
• GMT: Greenwich Mean Time; not used, and is NOT UTC, GMT is still used as a TIMEZONE which implies a location, UTC not.
• Unix Timestamp: synchronized to UTC (has leap seconds!) , seconds since 1.1.1970, 0:00 GMT, usable for times AFTER 1.1.1972 where UTC is used. So the time stamp does not "really" show "atomic clock seconds" since 1970.
The difference between UTC and the atomic clocks becomes greater and greater because the earth rotatopn slows down.