如何在文件中减去两个单独的列

时间:2018-05-10 10:22:32

标签: python linux

我有一个看起来像这样的文件:

scaf12446   275     482     loc.04759  .       +       9.99087136654
scaf9003    58436   58745   loc.36424  .       +       9.98867551051e-07
scaf6164    41519   44781   loc.29229  .       -       9.97790659076e-07
scaf20      64796   100635  loc.14273  .       -       9.97726500173
scaf19280   12335   12568   loc.13668  .       +       9.95702976886
scaf8877    30882   32362   loc.36113  .       -       9.94423702955e-08

我想从第2列中减去第3列并打印出值。所以生成的文件应该是这样的:

scaf12446   207     loc.04759  .       +       9.99087136654
    scaf9003    309   loc.36424  .       +       9.98867551051e-07
    scaf6164    3262   loc.29229  .       -       9.97790659076e-07
    scaf20      35839  loc.14273  .       -       9.97726500173
    scaf19280   233   loc.13668  .       +       9.95702976886
    scaf8877    1480  loc.36113  .       -       9.94423702955e-

这个表很长 - 是否有一种简单的方法可以简单地从第2列中减去第3列? Linux快捷方式是理想的。

1 个答案:

答案 0 :(得分:0)

使用python和pandas

#! /usr/bin/env python3

"""
sub_col.py

Call as
    $ python sub_col.py file
"""

import sys
import pandas as pd

def main(file):
    df = pd.read_csv(file, delim_whitespace=True, header=None)
    df[2] -= df[1]
    del df[1]
    df.to_csv(file, header=False, index=False, sep='\t')

if __name__ == '__main__':
    main(sys.argv[1])

输出数据帧:

scaf12446   207 loc.04759   .   +   9.990871366539999
scaf9003    309 loc.36424   .   +   9.98867551051e-07
scaf6164    3262    loc.29229   .   -   9.97790659076e-07
scaf20  35839   loc.14273   .   -   9.97726500173
scaf19280   233 loc.13668   .   +   9.95702976886
scaf8877    1480    loc.36113   .   -   9.944237029550001e-08