Question

我正在处理一个大的txt文件（1,000,000个元素），例如：

tammy_wynette    band
tammy_wynette    artist
tammy_wynette    musical_artist
tammy_wynette    group
tammy_wynette    person
tammy_wynette    agent
tammy_wynette    organisation
mansion_historic_district    architectural_structure
mansion_historic_district    place
mansion_historic_district    building
joe_sutter    person
joe_sutter    agent

我想得的只是每个项目的第一个元素：

tammy_wynette    band
mansion_historic_district    architectural_structure
joe_sutter    person

我使用字典，但我的代码很慢：

dicCSK = {} 
for line in fin:
    line=line.strip('\n')
    try:
        c1, c2 = line.split("\t")
    except ValueError: print line
    if c1 not in dicCSK.keys():
        dicCSK[c1]=c2
        fout.writelines(c1+"\t"+c2+'\n')

有没有快速的方法呢？

Answer 1

只需if c1 not in dicCSK:代替if c1 not in dicCSK.keys():。如果你正在使用Python 2.x keys将返回键作为列表，这意味着它们需要按顺序检查。

如果您之后没有使用这些值，也可以改为使用set：

dicCSK = set()
for line in fin:
    line=line.strip('\n')
    try:
        c1, c2 = line.split("\t")
    except ValueError: print line
    if c1 not in dicCSK:
        dicCSK.add(c1)
        fout.writelines(c1+"\t"+c2+'\n')

如何只存储python中的第一个元素

1 个答案: