Python:通过比较文件

时间:2017-05-15 07:33:06

标签: python list file dictionary set

我在执行以下任务时遇到了一些问题。

有两个文件。第一个文件(子文件)包含子项与其父项标识号之间的连接,第二个文件(名称文件)包含人员标识号和名称之间的连接。

在每行的子文件中都有父母识别码和他/她的孩子识别码:

47853062345 60907062342
46906183451 38504014543
34105139833 36512129874

名称文件包含识别码和名称:

47853062345 Kadri Kalkun
36512129874 Peeter Peedumets
38504014543 Maria Peedumets
46906183451 Madli Peedumets
34105139833 Karl Peedumets
60907062342 Liisa Maria Jaaniste

可以安全地假设名称文件不包含重复的名称或标识代码。此外,子文件中的每个识别码在名称文件中都有相应的名称。

函数connect有2个参数:children文件名和名称文件名。它返回一个字典,其中key是父母姓名,值是他/她孩子的集合。

children.txt:

47853062345 60907062342
46906183451 38504014543
34105139833 36512129874
36512129874 38504014543
46906183451 48708252344
36512129874 48708252344

names.txt中:

47853062345 Kadri Kalkun
36512129874 Peeter Peedumets
38504014543 Maria Peedumets
46906183451 Madli Peedumets
34105139833 Karl Peedumets
48708252344 Robert Peedumets
60907062342 Liisa Maria Jaaniste

输出:

connect('children.txt', 'names.txt')

{'Peeter Peedumets': {'Maria Peedumets', 'Robert Peedumets'},
'Madli Peedumets': {'Maria Peedumets', 'Robert Peedumets'}, 
'Karl Peedumets': {'Peeter Peedumets'}, 
'Kadri Kalkun': {'Liisa Maria Jaaniste'}}

我已将这两个文件读入列表和字典。用名字替换了ID代码,但我无法将我的大脑包裹起来以了解如何获得最终结果。 到目前为止我的代码:

def connect(children_file,names_file):
    #children = {}
   # with open(children_file, encoding="UTF-8") as f:
        #for line in f:
           #(key, val) = line.split()
           #children[key.strip("\ufeffn' ").strip("\n ")] = val
    with open(children_file, encoding="UTF-8") as ins:
        children = [[n.strip("\ufeffn' ").strip("\n ") for n in line.split()] for line in ins]

    names = {}
    with open(names_file, encoding="UTF-8") as f:
        for line in f:
            splitLine = line.split()
            names[splitLine[0].strip("\ufeffn' ").strip("\n ")] = " ".join(splitLine[1:])
    names.items()
    for lst in children:
      for ind, item in enumerate(lst):
          if item in names:
              lst[ind] = names[item]

    d = {}
    for i in range(len(children[0][:])):
        if children[0][i] not in d:
            d[children[0][i]] = set()
        d[children[0][i]].add(children[1][i])


    return d

print(connect("children.txt","names.txt"))      

1 个答案:

答案 0 :(得分:1)

您的代码总体上效率低下。不要制作儿童列表,直接制作地图 。您可以利用字典setdefault method, or, you could use a collections.defaultdict,但为了简单起见,我将使用前者。所以,简单地说:

>>> with io.StringIO(children_str) as cf, io.StringIO(names_str) as nf:
...     parentmap = {}
...     namemap = {}
...     for line in cf:
...         pid, cid = line.strip().split()
...         parentmap.setdefault(pid, set()).add(cid)
...     for line in nf:
...         nid, name = line.strip().split(maxsplit=1) 
...         namemap[nid] = name
...
>>> from pprint import pprint
>>> pprint(parentmap)
{'34105139833': {'36512129874'},
 '36512129874': {'38504014543', '48708252344'},
 '46906183451': {'38504014543', '48708252344'},
 '47853062345': {'60907062342'}}
>>> pprint(namemap)
{'34105139833': 'Karl Peedumets',
 '36512129874': 'Peeter Peedumets',
 '38504014543': 'Maria Peedumets',
 '46906183451': 'Madli Peedumets',
 '47853062345': 'Kadri Kalkun',
 '48708252344': 'Robert Peedumets',
 '60907062342': 'Liisa Maria Jaaniste'}

注意,我正在使用io.StringIO假装我正在使用文件,而是使用我直接从问题中复制的字符串。但是io.StringIO让你把字符串视为一个文件,但你只需要像往常一样打开你的文件。另请注意,当我从maxsplit拆分行时,我使用了names.txt参数,因此名称本身不会被拆分。

要获得最终结果,只需使用:

>>> final = {namemap[k]:{namemap[n] for n in v} for k,v in parentmap.items()}
>>> pprint(final)
{'Kadri Kalkun': {'Liisa Maria Jaaniste'},
 'Karl Peedumets': {'Peeter Peedumets'},
 'Madli Peedumets': {'Robert Peedumets', 'Maria Peedumets'},
 'Peeter Peedumets': {'Robert Peedumets', 'Maria Peedumets'}}