基本上我想要编写与代码中引用的id列表匹配的文档行。
nodeIDs.txt:
...有417个对象,
10000
10023
1017
1019
1021
1026
1027
1029
...
Adherens junction.txt:
...有73行,
4301: AFDN; afadin, adherens junction formation factor
1496: CTNNA2; catenin alpha 2
283106: CSNK2A3; casein kinase 2 alpha 3
2241: FER; FER tyrosine kinase
60: ACTB; actin beta
1956: EGFR; epidermal growth factor receptor
56288: PARD3; par-3 family cell polarity regulator
10458: BAIAP2; BAI1 associated protein 2
51176: LEF1; lymphoid enhancer binding factor 1
我试图让程序逐行进行并引用id列表,如果行的开头字符与列表中找到的任何字符匹配,则将该行写入新文档。我正在研究数据集,但我不确定这些是否适用于此。
到目前为止我的代码:
ids = []
with open('nodeIDs.txt', 'r') as n:
for line in n:
ids.append(line)
n.close()
# Import data from the pathway file and turn into a list
g = []
with open('Adherens junction.txt', 'r') as a:
for line in a:
g.append(line)
a.close()
aj = open('Adherens.txt', 'a')
for line in a:
if ids[i] in line:
aj.write(line)
aj.close()
你能帮我解决这个问题吗?
答案 0 :(得分:2)
以下是我认为你所做的一些代码。
<强>代码:强>
# read ids file into a set
with open('file1', 'r') as f:
# create a set comprehension
ids = {line.strip() for line in f}
# read the pathway file and turn into a list
with open('file2', 'r') as f:
# create a list comprehension
pathways = [line for line in f]
# output matching lines
with open('file3', 'a') as f:
# loop through each of the pathways
for pathway in pathways:
# get the number in front of the ':'
start_of_line = pathway.split(':', 1)[0]
# if this is in 'ids' output the line
if start_of_line.strip() in ids:
f.write(pathway)
<强>结果:强>
2241: FER; FER tyrosine kinase
56288: PARD3; par-3 family cell polarity regulator
<强>文件1:强>
10000
56288
2241
<强> file2的:强>
4301: AFDN; afadin, adherens junction formation factor
1496: CTNNA2; catenin alpha 2
283106: CSNK2A3; casein kinase 2 alpha 3
2241: FER; FER tyrosine kinase
60: ACTB; actin beta
1956: EGFR; epidermal growth factor receptor
56288: PARD3; par-3 family cell polarity regulator
10458: BAIAP2; BAI1 associated protein 2
51176: LEF1; lymphoid enhancer binding factor 1
什么是集合理解?
此:
# create a set comprehension
ids = {line.strip() for line in f}
与:
相同# create a set
ids = set()
for line in f:
ids.add(line.strip())