跨越文档字段以确定最新的

时间:2016-08-25 15:48:32

标签: python list dictionary

我正在研究一种软件,该软件读取包含多个文档的txt文件,并确定哪些文档是最新的......我需要越过docnumber x date x closing date

例如:

文件'name1.txt'在以下文件中包含以下文件:

docnumber: m282378278292, date: 2009/09, closing date: 2010, 1, 5, 13, 59, 43
docnumber: 3983238923823, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43 (most new)
docnumber: 3030290909022, date: 2009/09, closing date: 2013, 1, 5, 13, 59, 43 (most new)
docnumber: h287387322825, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43 (most new)

文件'name2.txt'在以下文件中包含以下文件:

docnumber: m282378278292, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43 (most new)
docnumber: 3983238923823, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43 (most new)
docnumber: 3030290909022, date: 2009/09, closing date: 2012, 1, 5, 13, 59, 43
docnumber: 3202930290239, date: 2009/09, closing date: 2015, 1, 5, 13, 59, 43 (most new)

file'name3.txt'在以下文档中包含以下文档:

docnumber: 2298982918992, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43 (most new)
docnumber: 0434900990932, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43 (most new)
docnumber: 2290301112933, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43 (most new)
docnumber: 3944898uN2898, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43 (most new)

我需要越过这三个字段来找出哪些是最新的。在上面的例子中,newests是:

docnumber: 3983238923823, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43
docnumber: 3030290909022, date: 2009/09, closing date: 2013, 1, 5, 13, 59, 43
docnumber: h287387322825, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43

name2.txt:

docnumber: m282378278292, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43
docnumber: 3983238923823, date: 2009/09, closing date: 2011, 1, 5, 13, 59, 43
docnumber: 3030290909022, date: 2009/09, closing date: 2012, 1, 5, 13, 59, 43
docnumber: 3202930290239, date: 2009/09, closing date: 2015, 1, 5, 13, 59, 43

name3.txt:

docnumber: 2298982918992, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43
docnumber: 0434900990932, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43
docnumber: 2290301112933, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43
docnumber: 3944898uN2898, date: 2009/10, closing date: 2011, 1, 5, 13, 59, 43

仅当closing datedocnumber相等时,我才必须比较date。为了说明,我只是在结束日期更改了年份,但它可以有任何变化,而不仅仅是在这一年。

无论如何......我能够在列表中创建dicts来表示每个 文件。

每个表示的界面:

{
    [file name]: {
        [date]: {
            [docnumber]: [closingdate]
        }
    }
}

上述文件的表示:

[
    'name1.txt': {
        '2009/09': {
            'm282378278292': '2010, 1, 5, 13, 59, 43',
            '3983238923823': '2011, 1, 5, 13, 59, 43',
            '3030290909022': '2013, 1, 5, 13, 59, 43',
            'h287387322825': '2011, 1, 5, 13, 59, 43'
        }
    },
    'name2.txt': {
        '2009/09': {
            'm282378278292': '2011, 1, 5, 13, 59, 43',
            '3983238923823': '2011, 1, 5, 13, 59, 43',
            '3030290909022': '2012, 1, 5, 13, 59, 43',
            '3202930290239': '2015, 1, 5, 13, 59, 43'
        }
    },
    'name3.txt': {
        '2009/10': {
            '2298982918992': '2011, 1, 5, 13, 59, 43',
            '0434900990932': '2011, 1, 5, 13, 59, 43',
            '2290301112933': '2011, 1, 5, 13, 59, 43',
            '3944898uN2898': '2011, 1, 5, 13, 59, 43'
        }    
    }
]

欲望输出:

[
    'name1.txt': {
        '2009/09': {
            'm282378278292': false,
            '3983238923823': true,
            '3030290909022': true,
            'h287387322825': true
        }
    },
    'name2.txt': {
        '2009/09': {
            'm282378278292': true,
            '3983238923823': true,
            '3030290909022': false,
            '3202930290239': true
        }
    },
    'name3.txt': {
        '2009/10': {
            '2298982918992': true,
            '0434900990932': true,
            '2290301112933': true,
            '3944898uN2898': true,
        }    
    }
]

我的问题是......我怎么能做到这一点? 这是我在python中的第一个项目,请原谅我提出一个问题而不至少展示我的尝试......但说实话,我不知道从哪里开始......

我感谢任何帮助!

提前谢谢。

1 个答案:

答案 0 :(得分:0)

首先考虑构建 maxValues 字典,然后将其与原始主字典进行比较。许多for循环是由于您的嵌套字典结构。

maxValues = {}
for d in dictList:
    for k1,v1 in d.items():
        for k2, v2 in v1.items():            
            for k3, v3 in v2.items():                                
                maxValues[(k2,str(k3))] = maxValues[(k2,str(k3))] \
                                if (k2,str(k3)) in [i for i in maxValues.keys()] else ''

                maxValues[(k2,str(k3))] = max(maxValues[(k2,str(k3))], d[k1][k2][k3])

for d in dictList:
    for k1,v1 in d.items():
        for k2, v2 in v1.items():
            inner = {k: v == maxValues[(k2,str(k))] for k, v in d[k1][k2].items()}

            d[k1][k2] = inner

print(dictList)
# [{'name1.txt': {'2009/09': {'3030290909022': True, 'm282378278292': False, 
#                             '3983238923823': True, 'h287387322825': True}}, 
#   'name2.txt': {'2009/09': {'3030290909022': False, 'm282378278292': True, 
#                             '3983238923823': True, '3202930290239': True}}, 
#   'name3.txt': {'2009/10': {'0434900990932': True, '3944898uN2898': True, 
#                             '2290301112933': True, '2298982918992': True}}}]

<强>解释

第一个循环集遍历每个嵌套的字典元素,并使用元组键和日期值构建一个级别的maxValues{}字典:key = ('date': 'docNumber'); value = datevalue。在构建字典时,循环集还使用max()函数替换值,以便date/docNumber重复配对,其中较大的 datevalue 项替换或保留在该键的字典中。最终,maxValues包含每个不同date/docNumber配对的最大日期值元素。

第二个循环集遍历上面的嵌套循环,并使用布尔值(而不是之前的 datevalues )重新创建最后一个嵌套字典。要重新创建此嵌套的dicitonary,使用dictionary comprehension将当前循环值与当前元组键的maxValue进行比较。比较使用等式表达式生成TrueFalse