我想在文件中加入两行,根据它们是否以相同的元素开头 我可以将每行的第一个元素转换为一个列表,并使用此列表中的元素来搜索每一行,但这似乎不是最有效的方法吗?
我有以下文件
1,AF534061.1,T,A
1,K02718.1,T,A
16,AF534061.1,G,-
16,K02718.1,G,-
17,AF534061.1,T,-
17,K02718.1,T,-
18,AF534061.1,A,-
18,K02718.1,A,-
19,AF534061.1,T,-
19,K02718.1,T,-
20,AF534061.1,A,-
20,K02718.1,A,-
21,AF534061.1,A,-
21,K02718.1,A,-
24,AF534061.1,C,T
如果第一项在行之间共享,我想加入行。所以我想得到以下输出
1,AF534061.1,T,A,1,K02718.1,T,A
16,AF534061.1,G,-,16,K02718.1,G,-
17,AF534061.1,T,-,17,K02718.1,T,-
18,AF534061.1,A,-,18,K02718.1,A,-
19,AF534061.1,T,-,19,K02718.1,T,-
20,AF534061.1,A,-,20,K02718.1,A,-
21,AF534061.1,A,-,21,K02718.1,A,-
24,AF534061.1,C,T
在这个例子中,看起来我可能只能加入其他所有行,但我希望(需要)使代码更通用!
我不认为这很难,但我似乎无法弄明白! 谢谢你的帮助
答案 0 :(得分:5)
Python标准库中充满了各种工具。对于此职位,请使用itertools.groupby。
import itertools
lines = '''1,AF534061.1,T,A
1,K02718.1,T,A
16,AF534061.1,G,-
16,K02718.1,G,-
17,AF534061.1,T,-
17,K02718.1,T,-
18,AF534061.1,A,-
18,K02718.1,A,-
19,AF534061.1,T,-
19,K02718.1,T,-
20,AF534061.1,A,-
20,K02718.1,A,-
21,AF534061.1,A,-
21,K02718.1,A,-
24,AF534061.1,C,T'''.split('\n')
for key, group in itertools.groupby(lines, lambda line: line.partition(',')[0]):
print ','.join(group)
答案 1 :(得分:0)
您可以使用正则表达式和反向引用。
print re.sub(r'(([^,]+).*)\n(\2.*\n)', r'\1\3', data)
以下是解释的表达式:
( # Start of first line
( # Start of first part of line, refered to as \2
[^,]+ # Everything before the first comma
)
.* # Remainder of first line
) # This new line isn't in any capture groups, so it'll be
\n # removed from any matched results
( # Start of second line
\2 # This takes the first part of the first line and requires
# it to match again
.* # Remainder of second line
\n # We include this newline to make the next search start at
# the start of the following line. It's reinserted because
# it's in the second line's capture group.
)
答案 2 :(得分:-2)
我没有测试过这段代码,但是这样的代码应该可以运行:
common = {}
for line in file.readLines():
prefix = line.split(",")[0]
if prefix in common:
common[prefix].append(line)
else:
common[prefix] = [line]
for key, values in common:
print values.join(",")