Load Mass Spectrum and calculate metrics

import natorgms.draw as draw
from natorgms.spectrum import Spectrum

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Load and save

Load mass spectrum from csv file. When loading, if the headings of m/z and intensity are not default (“mass”, “intensity”), you must specify them by mapper. You also need to specify a separator (default is “,”). In order not to load unnecessary data, we will also set the take_only_mz flag True.

spec = Spectrum.read_csv(filename = "data/sample2.csv",
                            mapper = {'m/z':'mass', "I":'intensity'},
                            take_only_mz = True,
                            sep = ',',
                            )

Now we can plot spectrum

draw.spectrum(spec)

png

The next step is assignement of brutto formulas for the masses. By default, this process is performed on the following range of elements: ‘C’ from 4 to 50, ‘H’ from 4 to 100, ‘O’ from 0 to 25, ‘N’ from 0 to 3, ‘S’ from 0 to 2. The following rules are also followed by default: 0.25<H/C<2.2, O/C < 1, nitogen parity, DBE-O <= 10.

We can specify elements by brutto_dict parameters as bellow. If you want use isotopes use “_” and number, for example “C_13”

rel_error is allowable relative error. By default it is 0.5

spec = spec.assign(brutto_dict={'C':(4,51), 'C_13':(0,3), 'H':(4,101), 'O':(0, 26), 'N':(0,3)}, rel_error=0.5)

Now you can see masses with brutto formulas. But first we can drop unassigned formulas

spec = spec.drop_unassigned()
spec.table
mass intensity assign C C_13 H O N
0 205.08706 5072918 True 12.0 0.0 14.0 3.0 0.0
1 207.06445 3388781 True 13.0 1.0 9.0 1.0 1.0
2 209.04557 6721187 True 10.0 0.0 10.0 5.0 0.0
3 209.08202 4015701 True 11.0 0.0 14.0 4.0 0.0
4 210.02769 3463522 True 12.0 1.0 6.0 3.0 0.0
... ... ... ... ... ... ... ... ...
6273 991.68139 8570909 True 46.0 2.0 98.0 18.0 2.0
6274 993.51449 10621638 True 41.0 2.0 80.0 23.0 2.0
6275 993.71227 8240803 True 50.0 2.0 100.0 15.0 2.0
6276 995.67950 9630054 True 50.0 2.0 98.0 17.0 0.0
6277 999.37335 8356890 True 43.0 2.0 62.0 23.0 2.0

6278 rows × 8 columns

After assignment we can calculate different metrics: H/C, O/C, CRAM, NOSC, AI, DBE and other. We can do it separate by such methods as calc_ai, calc_dbe … or do all by one command calc_all_metrics

spec = spec.calc_all_metrics()

Now we can see all metrics

spec.table.columns
Index(['mass', 'intensity', 'assign', 'C', 'C_13', 'H', 'O', 'N', 'calc_mass',
       'abs_error', 'rel_error', 'DBE', 'DBE-O', 'DBE_AI', 'CAI', 'AI',
       'DBE-OC', 'H/C', 'O/C', 'class', 'CRAM', 'NOSC', 'brutto', 'Ke', 'KMD'],
      dtype='object')

You can save the data with spectrum and calculated metrics to a csv file at any time by to_csv method

spec.to_csv('temp.csv')

Draw

Data can be visualized by different methods

Simple Van-Krevelen diagramm. By default CHO formulas is blue, CHON is orange, CHOS is green, CHONS is red. There is no S, so it is only two colors here

draw.vk(spec)

png

We can plot it with density axis

draw.vk(spec, draw.scatter_density)

png

Or do it with any of metrics in spectrum. For example, NOSC vs DBE-OC

draw.scatter(spec, x='NOSC', y='DBE-OC')
draw.scatter_density(spec, x='NOSC', y='DBE-OC')

png

png

We can plot separate density

draw.density(spec, 'AI')

png

Or plot 2D kernel density scatter

draw.density_2D(spec, x='NOSC', y='DBE-OC')

png

We can plot Kendric diagramm by command

draw.scatter(spec, x='Ke', y='KMD')

png

Molecular class

We can get average density of molecular classes of brutto formulas in spectrum

spec.get_mol_class()
class density
0 unsat_lowOC 0.553290
1 unsat_highOC 0.404771
2 condensed_lowOC 0.000409
3 condensed_highOC 0.000278
4 aromatic_lowOC 0.010322
5 aromatic_highOC 0.000261
6 aliphatics 0.008320
7 lipids 0.003229
8 N-satureted 0.009970
9 undefinded 0.009150

Metrics

We can get any metrics that avarage by weight of intensity.

spec.get_mol_metrics()
metric value
0 AI -0.016670
1 C 24.216486
2 CAI 12.502903
3 CRAM 0.798058
4 C_13 0.231241
5 DBE 12.569666
6 DBE-O 0.742390
7 DBE-OC 0.027562
8 DBE_AI 0.624843
9 H 25.873668
10 H/C 1.057624
11 N 0.117547
12 NOSC -0.064638
13 O 11.827276
14 O/C 0.489112
15 mass 509.495873
spec.get_mol_metrics(metrics=['AI', 'DBE', 'NOSC', 'H/C', 'O/C'])
metric value
0 AI -0.016670
1 DBE 12.569666
2 H/C 1.057624
3 NOSC -0.064638
4 O/C 0.489112

We can avarage the same by mean or other function(max, min, std, median)

spec.get_mol_metrics(metrics=['AI', 'DBE', 'NOSC', 'H/C', 'O/C'], func='mean')
metric value
0 AI -0.099484
1 DBE 12.824626
2 H/C 1.088260
3 NOSC -0.117060
4 O/C 0.462034

Also we can split VanKrevelen diagramm to squares and calculate density in each squares

spec.get_squares_vk(draw=True)
value square
0 0.000059 1
1 0.001766 2
2 0.002253 3
3 0.003180 4
4 0.003691 5
5 0.001909 6
6 0.207173 7
7 0.257413 8
8 0.019422 9
9 0.002741 10
10 0.001424 11
11 0.144290 12
12 0.333394 13
13 0.012810 14
14 0.001791 15
15 0.000304 16
16 0.001035 17
17 0.003152 18
18 0.001605 19
19 0.000586 20

png

It may also be useful to calculate the dependence od DBE vs nO. By fit the slope we can determinen state of sample

spec.get_dbe_vs_o(draw=True, olim=(5, 18))
(0.5649594882493085, 6.118940572968191)

png

Using the obtained metrics, it is possible to classify samples by origin or property, train different models.