我们如何根据TSV文件中的列索引解析数据? 一旦我们从文件中读取数据,那么我们必须检查第0列第1行数据和第0行第2行数据,如果匹配则获取第1行第1行数据,并且需要在第1行第1行追加所有匹配条目。
例如, SystemType.tsv文件
Actrius 1990s drama films
Actrius Catalan language films
Actrius Spanish films
Actrius Barcelona in fiction
Actrius Films directed by Ventura Pons
Actrius 1996 films
An_American_in_Paris Compositions by George Gershwin
An_American_in_Paris Symphonic poems
An_American_in_Paris Grammy Hall of Fame Award recipients
在第0栏第1行“Actrius”就在那里我们需要比较第0列中的所有行,并将匹配的条目第1列值与逗号分隔的形式放在一起,如下所示。
输出:
Actrius 1990s drama flims,Cataln language flims,Spanish flims,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients
我试过这个,但对我不起作用。
def finalextract():
lines_seen = set()
outfile = open("Output.txt","w+")
infile = open("SystemType.tsv","r+")
for line in infile:
if line[0] == lines_seen[0]:
string = line[1]+','+lines_seen[1]
outfile.write(string)
lines_seen.add(string)
infile.close()
outfile.close()
def finalextract():
lines_seen = set()
outfile = open("Output.txt","w+")
infile = open("SystemType.tsv","r+")
for line in infile:
if line[0] == lines_seen[0]:
string = line[1]+','+lines_seen[1]
outfile.write(string)
lines_seen.add(string)
infile.close()
outfile.close()
答案 0 :(得分:0)
这是我提出的(Python 3,但我认为唯一的区别应该是我的打印功能。如果你想用它来写输出文件,你可以from __future__ import print_function
):
import collections
# I used variable "input" to hold the string from your example .tsv contents;
# you'd really want to read it in from a file.
D = collections.OrderedDict()
for line in input.splitlines():
key, value = line.split('\t')
if key not in D:
D[key] = []
D[key].append(value.strip())
for key, values in D.items():
print(key, ','.join(values), sep='\t')
我的输出是:
Actrius 1990s drama films,Catalan language films,Spanish films,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients