Python:以出版物格式列出分子的质谱片段

时间:2016-10-12 01:26:39

标签: sorting printing rounding string-formatting

在质谱(MS)中,分子被电离并破碎成碎片(离子),这些碎片根据其质荷比( m / z )被强磁场偏转并最终被检测到。来自探测器的结果数据称为质谱,其表示所得碎片的( m / z )对应的强度(丰度)。以下实例是甲苯的质谱(C 7 H 8 )。如图所示,最强的离子( m / z = 91)被赋予100%的丰度,它被称为基峰:

以表格格式,它将生成为:

  m/z, A
  45.47, 8.91
  46.62, 9.21
  ...
  91.27, 100
  93.541, 54.369

第一列对应片段 m / z ,第二列对应相对丰度;每一行称为峰值

为了将此类数据正确格式化为可用于发布的表格,如下所示:

| Analyte | Molecular formula | Molecular weight (Da) | [M]+ *m/z* | MS fragments m/z                                                                                                                                                        |
|---------|-------------------|-----------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Toluene |        C7H8       |         92.14         |    91    | 94 (5), 93 (72), **91 (100)**, 91 (2), 89 (4), 78 (2), 76 (1), 67 (5), 65 (50), 64 (6), 64 (27), 62 (10), 61 (5), 54 (4), 53 (6), 52 (23), 51 (12), 50 (1), 47 (9), 45 (9). |

我开始编写一个脚本(在Python中),该脚本应该执行以下操作:

  • 从文本文件(txt或csv)中读取许多分子的列表质谱
  • 对于每个文件,通过减少 m / z
  • 对峰进行排序
  • 将两者( m / z )和相对丰度( A )四舍五入到最接近的整数
  • 丢弃相对丰度较弱的峰( A <1)
  • 根据上面的示例打印基峰和峰列表
  • 以发布格式生成表格,如.pdf od .doc

这是我尝试使用python pandas库:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import glob, os
from os.path import splitext
os.chdir("/path_to_inupt_file")
import pandas as pd



def print_mzlist(fx):
    # Read peaks table and sort according to *m/z* 
    f = pd.read_csv(fx)
    compname=str(os.path.splitext(fx)[0])
    df = pd.DataFrame(f)
    dfs = df.sort('m_z',ascending=False)


    # Strip peaks with *A* < 1
    dfss = dfs.query('A > 1')

    # Select base peak with A = 100% and add 1 to m/z because the ionization mode is positive
    base_peak=dfs.query('A == 100')+1

    # Create the list of peaks with base_peak as first element
    peaks_list = ""
    bps = str("%s MS1[%0.f]: " %(compname.title(), base_peak['m_z']))
    # Append the Compound name *compname* and *base_peak*
    peaks_list = peaks_list + bps 
    # Append peaks list
    for index, row in dfss.iterrows():
        peaks_list += str ("%.0f (%0.f), " % (row['m_z'], row['A']))
    peaks_list = peaks_list[:-2] + '.'
    return (peaks_list)

# Open the text file to store results
text_file = open("Results.txt", "w")

# Reading csv files and printing the results  
for filename in glob.glob("*.csv"):
    print "Processing the file", filename
    pk= print_mzlist(filename)
    text_file.write("{}\n".format(pk))
    print "Wrinting results to:", text_file.name
text_file.close()
print "Done"

输出:

Xylene MS1[92]: 107 (6), 106 (66), 105 (29), 104 (3), 103 (6), 102 (1), 92 (8), 91 (100), 89 (2), 79 (7), 78 (6), 77 (12), 74 (1), 65 (6), 63 (5), 62 (2), 53 (3), 52 (3), 51 (9), 50 (4), 41 (1), 39 (8), 38 (1), 27 (4).
Toluene MS1[92]: 94 (5), 93 (72), 91 (100), 91 (2), 89 (4), 78 (2), 76 (1), 67 (5), 65 (50), 64 (6), 64 (27), 62 (10), 61 (5), 54 (4), 53 (6), 52 (23), 51 (12), 50 (1), 47 (9), 45 (9).
Cumene MS1[106]: 121 (3), 120 (27), 106 (9), 105 (100), 104 (3), 103 (7), 102 (1), 91 (6), 79 (12), 78 (6), 77 (14), 65 (2), 63 (2), 53 (1), 52 (3), 51 (9), 50 (3), 41 (3), 39 (6), 28 (1), 27 (4).
Styrene MS1[105]: 105 (9), 104 (100), 103 (41), 102 (6), 89 (2), 79 (2), 78 (35), 77 (17), 76 (4), 75 (2), 74 (3), 65 (1), 63 (5), 62 (2), 52 (7), 51 (17), 50 (7), 39 (6), 38 (1), 27 (3).

源文件csv在此github repo中可用。

我需要 stackoverflow 社区:

  • 修改代码(任何建议)
  • 告诉我一个方法(或库)将结果格式化为表格并将其打印为pdf或doc文件

0 个答案:

没有答案