在质谱(MS)中,分子被电离并破碎成碎片(离子),这些碎片根据其质荷比( m / z )被强磁场偏转并最终被检测到。来自探测器的结果数据称为质谱,其表示所得碎片的( m / z )对应的强度(丰度)。以下实例是甲苯的质谱(C 7 H 8 )。如图所示,最强的离子( m / z = 91)被赋予100%的丰度,它被称为基峰:
以表格格式,它将生成为:
m/z, A
45.47, 8.91
46.62, 9.21
...
91.27, 100
93.541, 54.369
第一列对应片段 m / z ,第二列对应相对丰度;每一行称为峰值。
为了将此类数据正确格式化为可用于发布的表格,如下所示:
| Analyte | Molecular formula | Molecular weight (Da) | [M]+ *m/z* | MS fragments m/z |
|---------|-------------------|-----------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Toluene | C7H8 | 92.14 | 91 | 94 (5), 93 (72), **91 (100)**, 91 (2), 89 (4), 78 (2), 76 (1), 67 (5), 65 (50), 64 (6), 64 (27), 62 (10), 61 (5), 54 (4), 53 (6), 52 (23), 51 (12), 50 (1), 47 (9), 45 (9). |
我开始编写一个脚本(在Python中),该脚本应该执行以下操作:
这是我尝试使用python pandas库:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import glob, os
from os.path import splitext
os.chdir("/path_to_inupt_file")
import pandas as pd
def print_mzlist(fx):
# Read peaks table and sort according to *m/z*
f = pd.read_csv(fx)
compname=str(os.path.splitext(fx)[0])
df = pd.DataFrame(f)
dfs = df.sort('m_z',ascending=False)
# Strip peaks with *A* < 1
dfss = dfs.query('A > 1')
# Select base peak with A = 100% and add 1 to m/z because the ionization mode is positive
base_peak=dfs.query('A == 100')+1
# Create the list of peaks with base_peak as first element
peaks_list = ""
bps = str("%s MS1[%0.f]: " %(compname.title(), base_peak['m_z']))
# Append the Compound name *compname* and *base_peak*
peaks_list = peaks_list + bps
# Append peaks list
for index, row in dfss.iterrows():
peaks_list += str ("%.0f (%0.f), " % (row['m_z'], row['A']))
peaks_list = peaks_list[:-2] + '.'
return (peaks_list)
# Open the text file to store results
text_file = open("Results.txt", "w")
# Reading csv files and printing the results
for filename in glob.glob("*.csv"):
print "Processing the file", filename
pk= print_mzlist(filename)
text_file.write("{}\n".format(pk))
print "Wrinting results to:", text_file.name
text_file.close()
print "Done"
输出:
Xylene MS1[92]: 107 (6), 106 (66), 105 (29), 104 (3), 103 (6), 102 (1), 92 (8), 91 (100), 89 (2), 79 (7), 78 (6), 77 (12), 74 (1), 65 (6), 63 (5), 62 (2), 53 (3), 52 (3), 51 (9), 50 (4), 41 (1), 39 (8), 38 (1), 27 (4).
Toluene MS1[92]: 94 (5), 93 (72), 91 (100), 91 (2), 89 (4), 78 (2), 76 (1), 67 (5), 65 (50), 64 (6), 64 (27), 62 (10), 61 (5), 54 (4), 53 (6), 52 (23), 51 (12), 50 (1), 47 (9), 45 (9).
Cumene MS1[106]: 121 (3), 120 (27), 106 (9), 105 (100), 104 (3), 103 (7), 102 (1), 91 (6), 79 (12), 78 (6), 77 (14), 65 (2), 63 (2), 53 (1), 52 (3), 51 (9), 50 (3), 41 (3), 39 (6), 28 (1), 27 (4).
Styrene MS1[105]: 105 (9), 104 (100), 103 (41), 102 (6), 89 (2), 79 (2), 78 (35), 77 (17), 76 (4), 75 (2), 74 (3), 65 (1), 63 (5), 62 (2), 52 (7), 51 (17), 50 (7), 39 (6), 38 (1), 27 (3).
源文件csv在此github repo中可用。
我需要 stackoverflow 社区: