如何在下一行中找到重复值?

时间:2013-05-30 22:10:08

标签: python

我打开一个文本文件并在所有行上循环,将每个行排序成一个习惯性的字典。

def load(fileName):
    file = open(fileName+'.txt')
    for line in file:
        row = line.split()
        id = int(row[0])
        number = int(row[2])
        values = [int(row[3]),int(row[4]),int(row[5]),int(row[6])]
        dict = {number:[id, values]}
        print(dict)

我想检查下一行是否具有numberid的重复值,然后根据该值进行分组和排序。

我确信一个好的解决方案是将所有字典放入一个列表然后以某种方式操作它但我似乎无法使它工作它只是将每个dict放在不同的列表中。

我如何使用与line的每次迭代递增的.nextLine()index=0相似的内容检查每个line in file中的重复项?

输入示例:

1772 320 548 340 303 20 37 1
1772 320 551 337 306 22 37 1
1772 320 551 337 306 22 37 1
1772 320 551 337 306 22 37 1
1772 320 552 336 307 22 37 1
1772 320 553 335 308 22 37 1
1772 320 554 335 309 20 37 1
1783 345 438 31 436 40 36 1
1783 345 439 33 434 40 36 1
1783 345 440 35 432 40 36 1
1783 345 441 38 430 40 36 1
1783 345 442 39 431 40 36 1
1783 345 443 41 429 40 36 1
1783 345 444 44 428 40 36 1

输出示例:

{548: [1772, [340, 303, 20, 37]]}
{551: [1772, [337, 306, 22, 37]]}
{551: [1772, [337, 306, 22, 37]]}
{551: [1772, [337, 306, 22, 37]]}
{552: [1772, [336, 307, 22, 37]]}
{553: [1772, [335, 308, 22, 37]]}
{554: [1772, [335, 309, 20, 37]]}
{438: [1783, [31, 436, 40, 36]]}
{439: [1783, [33, 434, 40, 36]]}
{440: [1783, [35, 432, 40, 36]]}
{441: [1783, [38, 430, 40, 36]]}
{442: [1783, [39, 431, 40, 36]]}
{443: [1783, [41, 429, 40, 36]]}
{444: [1783, [44, 428, 40, 36]]}

3 个答案:

答案 0 :(得分:2)

只需保留您在随附的字典中看到的数字和ID。由于两者必须匹配,因此您可以将它们分组为元组:

def load(fileName):
    dupes_dic = {}
    file = open(fileName+'.txt')
    for line in file:
        row = line.split()
        id = int(row[0])
        number = int(row[2])
        values = [int(row[3]),int(row[4]),int(row[5]),int(row[6])]
        dic = {number:[id, values]}
        if dupes_dic[(number,id)]:
            #do some grouping or sorting or whatever
        else:
            dupes_dic[(number,id)] = values

如果您解释一下您想要的内容,我可以为答案添加更多内容。

编辑: OP实际上需要按ID排序的相同号码的商品。在这种情况下,这应该有效:

from collections import OrderedDict
def load(fileName):
    dupes_dic = {}
    file = open(fileName+'.txt')
    for line in file:
        row = line.split()
        id = int(row[0])
        number = int(row[2])
        values = [int(row[3]),int(row[4]),int(row[5]),int(row[6])]
        if number in dupes_dic:
            dupes_dic[number][id] = values
        else:
            dupes_dic[number] = {id: values}
    for number in dupes_dic:
        dupes_dic[number]['index'] = sorted(dupes_dic[number].keys())

然后你只需使用每个数字的索引按顺序拉出该数字的ids /值,如:

def getOrderedIds(number_dic):
    for id, values in number_dic['index'].iterkeys():
        print id
        print values

答案 1 :(得分:1)

from collections import OrderedDict as od
with open("abc") as f:
   dic = od()
   for line in f:
       row  = map(int,line.split())
       idx, num  = row[2], row[0]
       val = [num] + [row[3:-1]]
       dic.setdefault(idx,[]).append(val)

for k,v in dic.items():
    for val in v:
        print k,val

<强>输出:

548 [1772, [340, 303, 20, 37]]
551 [1772, [337, 306, 22, 37]]
551 [1772, [337, 306, 22, 37]]
551 [1772, [337, 306, 22, 37]]
552 [1772, [336, 307, 22, 37]]
553 [1772, [335, 308, 22, 37]]
554 [1772, [335, 309, 20, 37]]
438 [1783, [31, 436, 40, 36]]
439 [1783, [33, 434, 40, 36]]
440 [1783, [35, 432, 40, 36]]
441 [1783, [38, 430, 40, 36]]
442 [1783, [39, 431, 40, 36]]
443 [1783, [41, 429, 40, 36]]
444 [1783, [44, 428, 40, 36]]

答案 2 :(得分:1)

d = dict()
with open ("input") as f:
    for line in f:
        line = line.rstrip(" \n")
        row = line.split()
        if len(row) < 7: continue
        idx = int(row[0])
        number = int(row[2])
        values = [int(row[3]),int(row[4]),int(row[5]),int(row[6])]
        key = str(number) + ":" + str(idx)

        # add values corresponding to same number, idx pairs to ...
        # a list referenced by d[number:idx]

        if key not in d: d[key] = []
        d[key].append(values)

for key in d:
    n,i = key.split(":")
    # print out rows with number n and idx i
    for row in d[key]:
        print n, i, ",".join(str(x) for x in row)

输出:

551 1772 337,306,22,37
551 1772 337,306,22,37
551 1772 337,306,22,37
553 1772 335,308,22,37
552 1772 336,307,22,37
548 1772 340,303,20,37
554 1772 335,309,20,37