仅打印第二列中唯一的行

时间:2014-02-03 00:36:00

标签: python python-2.7 csv set

我有一个文件(1.csv),文本行看起来像:

"redostoneage",RedoStoneAge,False,7378,I love America. I love our Constitution. I hope we return to our LIBERTARIAN values!
"CelebVolger",redostoneage,False,7378,I love America. I love our Constitution. I hope we return to our LIBERTARIAN values!
"PatsyRoussel",PatsyRoussel,False,1690,Blue Libbie democrat progressive and proud of it !!

我只想打印(或写入新的.csv)行,其中第二列(即redostoneage)不会在紧随其后的行上重复。这就像在unix中一样,不区分大小写的 uniq命令。无需排序。所以对于输出我理想地喜欢:

"redostoneage",RedoStoneAge,False,7378,I love America. I love our Constitution. I hope we return to our LIBERTARIAN values!
"PatsyRoussel",PatsyRoussel,False,1690,Blue Libbie democrat progressive and proud of it !!

我见过有些人用套装做这件事。我认为我很接近,但我无法使该设置正常运行:

lines_seen = set() # holds lines already seen
for line in open('1.csv', "r"):
    columns = line.split(',')
    if len(columns) >= 2:
        username = columns[1]
        lowercaseusername = username.lower()
        if lowercaseusername not in lines_seen: # not a duplicate
            print line.strip()

1 个答案:

答案 0 :(得分:1)

你唯一缺少的是添加到lines_seen:

if lowercaseusername not in lines_seen:
    lines_seen.add(lowercaseusername) # <-- facepalm here
    print line.strip()

你刚忘了那条线。我确定你意识到为什么它是必要的:你只是在比较一个空集,从不添加你已经打印过的那些。