我有两个csv文件。
文件1:
id,site,longitude,latitude
**9936**,north,18.2,62.8
5856,north,17.4914,63.0167
**1298**,north,18.177,62.877
文件2:
chr,loc,4678,**1298**,2295,**9936**,7354
chr1,849,0,0,0,0,0,
chr1,3481,1,1,0,1,1
chr1,3491,0,2,0,2,0,
我想将file1中column1的id与file2中的行匹配(用**
突出显示),如果匹配则打印行和相应的行
输出:
chr,loc,**1298**,**9936**
chr1,849,0,0
chr1,3481,1,1
chr1,3491,0,2
我在python中尝试过这个
import csv
f1 = file('inFile.csv', 'rb')
f2 = file('inFile2.csv', 'rb')
f3 = file('outFile.csv', 'wb')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
matched_rows = [ row for row in c2 if row[2:6] in c1]
for row in matched_rows:
c3writerow[matched_rows]
但不幸的是它没有用。
答案 0 :(得分:0)
您需要首先从文件1 加载列,然后将其存储为使查找值高效的格式。 set
将在此处执行:
with open('inFile.csv', 'rb') as ids_file:
reader = csv.reader(ids_file)
next(reader, None) # skip the first row
ids = {r[0] for r in reader}
现在您可以测试匹配的列:
from operator import itemgetter
with open('inFile2.csv', 'rb') as f2, file('outFile.csv', 'wb') as outf:
reader = csv.reader(f2)
writer = csv.writer(outf)
headers = next(reader, [])
# produce indices for what headers are present in the ids set
matching_indices = [i for i, header in enumerate(headers[2:], 2) if header in ids]
selector = itemgetter(0, 1, *matching_indices)
# write selected columns to output file
writer.writerow(selector(headers))
writer.writerows(selector(row) for row in reader)
演示样本数据:
首先,生成第一列的集合:
>>> ids_file = '''\
... id,site,longitude,latitude
... 9936,north,18.2,62.8
... 5856,north,17.4914,63.0167
... 1298,north,18.177,62.877
... '''.splitlines()
>>> reader = csv.reader(ids_file)
>>> next(reader, None)
['id', 'site', 'longitude', 'latitude']
>>> ids = {r[0] for r in reader}
>>> ids
set(['5856', '9936', '1298'])
然后使用该数据使用operator.itemgetter()
生成选择器:
>>> from operator import itemgetter
>>> f2 = '''\
... chr,loc,4678,1298,2295,9936,7354
... chr1,849,0,0,0,0,0,
... chr1,3481,1,1,0,1,1
... chr1,3491,0,2,0,2,0,
... '''.splitlines()
>>> reader = csv.reader(f2)
>>> headers = next(reader, [])
>>> matching_indices = [i for i, header in enumerate(headers[2:], 2) if header in ids]
>>> matching_indices
[3, 5]
>>> selector = itemgetter(0, 1, *matching_indices)
现在,您可以使用该对象仅选择所需的列,以写入输出CSV文件:
>>> selector(headers)
('chr', 'loc', '1298', '9936')
>>> selector(next(reader))
('chr1', '849', '0', '0')
>>> selector(next(reader))
('chr1', '3481', '1', '1')
>>> selector(next(reader))
('chr1', '3491', '2', '2')