我有一个看起来像这样的文件:
scaf12446 275 482 loc.04759 . + 9.99087136654
scaf9003 58436 58745 loc.36424 . + 9.98867551051e-07
scaf6164 41519 44781 loc.29229 . - 9.97790659076e-07
scaf20 64796 100635 loc.14273 . - 9.97726500173
scaf19280 12335 12568 loc.13668 . + 9.95702976886
scaf8877 30882 32362 loc.36113 . - 9.94423702955e-08
我想从第2列中减去第3列并打印出值。所以生成的文件应该是这样的:
scaf12446 207 loc.04759 . + 9.99087136654
scaf9003 309 loc.36424 . + 9.98867551051e-07
scaf6164 3262 loc.29229 . - 9.97790659076e-07
scaf20 35839 loc.14273 . - 9.97726500173
scaf19280 233 loc.13668 . + 9.95702976886
scaf8877 1480 loc.36113 . - 9.94423702955e-
这个表很长 - 是否有一种简单的方法可以简单地从第2列中减去第3列? Linux快捷方式是理想的。
答案 0 :(得分:0)
使用python和pandas
#! /usr/bin/env python3
"""
sub_col.py
Call as
$ python sub_col.py file
"""
import sys
import pandas as pd
def main(file):
df = pd.read_csv(file, delim_whitespace=True, header=None)
df[2] -= df[1]
del df[1]
df.to_csv(file, header=False, index=False, sep='\t')
if __name__ == '__main__':
main(sys.argv[1])
输出数据帧:
scaf12446 207 loc.04759 . + 9.990871366539999
scaf9003 309 loc.36424 . + 9.98867551051e-07
scaf6164 3262 loc.29229 . - 9.97790659076e-07
scaf20 35839 loc.14273 . - 9.97726500173
scaf19280 233 loc.13668 . + 9.95702976886
scaf8877 1480 loc.36113 . - 9.944237029550001e-08