好吧,我接受标题对我的问题含糊不清,我无法以更易于理解的方式表达。我是编程新手,我的技术术语仍在发展中。
我有两个文件,文件A
如下:
CHROM POS ID AGM12 AGM14 AGM15 AGM18 ..
1 14930 rs150145850 0/0 1/1 0/0 0/0 ..
1 14933 rs138566748 0/0 0/0 0/0 0/0 ..
1 63671 rs116440577 0/1 0/0 0/0 0/0 ..
2 808922 rs6594027 0/0 0/0 0/0 0/1 ..
2 753474 rs2073814 1/0 0/0 0/1 0/0 ..
3 753405 rs61770173 0/0 1/1 0/0 1/0 ..
...
...
...
档案B
如下:
CHROM POS rsID Sample_ID
1 14930 rs150145850 AGM15
2 808922 rs6594027 AGM18
3 753405 rs61770173 AGM12
...
...
...
我希望使用文件B
中的POS字段信息(第2列),将文件Sample_ID
中相应的A
内容替换为NA
。
例如:输出应该看起来像
CHROM POS ID AGM12 AGM14 AGM15 AGM18
1 14930 rs150145850 0/0 1/1 NA 0/0
1 14933 rs138566748 0/0 0/0 0/0 0/0
1 63671 rs116440577 0/1 0/0 0/0 0/0
2 808922 rs6594027 0/0 0/0 0/0 NA
2 753474 rs2073814 1/0 0/0 0/1 0/0
3 753405 rs61770173 NA 1/1 0/0 1/0
我怎么能在Python或Unix中做到这一点?
答案 0 :(得分:1)
这是一个使用csv
模块的版本(我假设你的列是制表符分隔的。)
import csv
import collections
a = 'path/to/a'
b = 'path/to/b'
output = 'output/path'
pos = collections.defaultdict(list)
with open(b) as csvin:
reader = csv.DictReader(csvin, delimiter='\t')
for line in reader:
pos[line['POS']].append(line['Sample_ID'])
with open(a) as csvin, open(output, 'wb') as csvout:
reader = csv.DictReader(csvin, delimiter='\t')
writer = csv.DictWriter(csvout, fieldnames=reader.fieldnames, delimiter='\t')
writer.writeheader()
for line in reader:
fields = pos.get(line['POS'], [])
for field in fields:
line[field] = 'NA'
writer.writerow(line)
答案 1 :(得分:0)
试一试。
def method(file1, file2, fileout):
d1, d2, headers = {}
i = 1
with open(file1) as f1:
for line in f1:
vars = line.split('\t') #i am assuming tab seperated
d1[vars[1]] = [vars[0]] + vars[2:]
with open(file2) as f2:
for line in f2:
vars = line.split('\t')
d2[vars[1]] = vars[2]
for header in d1['POS']:
headers[header] = i
i+=1
with open(fileout, 'w') as fo:
fo.write("%s\tPOS\t%s\n" % (d1['POS'][0], "\t".join(d1['POS'][1:]))
del d1['POS']
for key, values in d1.items():
if key in d2:
d1[key][headers[d2[key]]] = "NA"
fo.write("%s\t%s\t%s\n" % (values[0], key, "\t".join(values[1:])))
答案 2 :(得分:0)
如果您不介意安装某些软件包,可以使用pandas
完全正确地执行此操作:
A = pandas.DataFrame.from_csv("A.txt", sep="\t", index_col=(0,1))
B = pandas.DataFrame.from_csv("B.txt", sep="\t", index_col=(0,1))
A.join(B) # the resulting dataset
当然,您必须选择pandas
才能执行此操作。