从一个TSV文件中读取多行并基于列以逗号添加数据

时间:2013-06-11 05:20:56

标签: python parsing python-2.7 text-parsing tsv

我们如何根据TSV文件中的列索引解析数据? 一旦我们从文件中读取数据,那么我们必须检查第0列第1行数据和第0行第2行数据,如果匹配则获取第1行第1行数据,并且需要在第1行第1行追加所有匹配条目。

例如, SystemType.tsv文件

Actrius  1990s drama films 
Actrius  Catalan language films 
Actrius  Spanish films 
Actrius  Barcelona in fiction 
Actrius  Films directed by Ventura Pons 
Actrius  1996 films 
An_American_in_Paris     Compositions by George Gershwin 
An_American_in_Paris     Symphonic poems 
An_American_in_Paris     Grammy Hall of Fame Award recipients 

在第0栏第1行“Actrius”就在那里我们需要比较第0列中的所有行,并将匹配的条目第1列值与逗号分隔的形式放在一起,如下所示。

输出:

Actrius   1990s drama flims,Cataln language flims,Spanish flims,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris    Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients

我试过这个,但对我不起作用。

def finalextract():
    lines_seen = set()
    outfile = open("Output.txt","w+")
    infile = open("SystemType.tsv","r+")
    for line in infile:
        if line[0] == lines_seen[0]:
            string = line[1]+','+lines_seen[1]
            outfile.write(string)
            lines_seen.add(string)
    infile.close()
    outfile.close()

def finalextract(): lines_seen = set() outfile = open("Output.txt","w+") infile = open("SystemType.tsv","r+") for line in infile: if line[0] == lines_seen[0]: string = line[1]+','+lines_seen[1] outfile.write(string) lines_seen.add(string) infile.close() outfile.close()

1 个答案:

答案 0 :(得分:0)

这是我提出的(Python 3,但我认为唯一的区别应该是我的打印功能。如果你想用它来写输出文件,你可以from __future__ import print_function):

import collections

# I used variable "input" to hold the string from your example .tsv contents;
# you'd really want to read it in from a file.

D = collections.OrderedDict()
for line in input.splitlines():
    key, value = line.split('\t')
    if key not in D:
        D[key] = []
    D[key].append(value.strip())

for key, values in D.items():
    print(key, ','.join(values), sep='\t')

我的输出是:

Actrius 1990s drama films,Catalan language films,Spanish films,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris    Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients