我在执行以下任务时遇到了一些问题。
有两个文件。第一个文件(子文件)包含子项与其父项标识号之间的连接,第二个文件(名称文件)包含人员标识号和名称之间的连接。
在每行的子文件中都有父母识别码和他/她的孩子识别码:
47853062345 60907062342
46906183451 38504014543
34105139833 36512129874
名称文件包含识别码和名称:
47853062345 Kadri Kalkun
36512129874 Peeter Peedumets
38504014543 Maria Peedumets
46906183451 Madli Peedumets
34105139833 Karl Peedumets
60907062342 Liisa Maria Jaaniste
可以安全地假设名称文件不包含重复的名称或标识代码。此外,子文件中的每个识别码在名称文件中都有相应的名称。
函数connect有2个参数:children文件名和名称文件名。它返回一个字典,其中key是父母姓名,值是他/她孩子的集合。
children.txt:
47853062345 60907062342
46906183451 38504014543
34105139833 36512129874
36512129874 38504014543
46906183451 48708252344
36512129874 48708252344
names.txt中:
47853062345 Kadri Kalkun
36512129874 Peeter Peedumets
38504014543 Maria Peedumets
46906183451 Madli Peedumets
34105139833 Karl Peedumets
48708252344 Robert Peedumets
60907062342 Liisa Maria Jaaniste
输出:
connect('children.txt', 'names.txt')
{'Peeter Peedumets': {'Maria Peedumets', 'Robert Peedumets'},
'Madli Peedumets': {'Maria Peedumets', 'Robert Peedumets'},
'Karl Peedumets': {'Peeter Peedumets'},
'Kadri Kalkun': {'Liisa Maria Jaaniste'}}
我已将这两个文件读入列表和字典。用名字替换了ID代码,但我无法将我的大脑包裹起来以了解如何获得最终结果。 到目前为止我的代码:
def connect(children_file,names_file):
#children = {}
# with open(children_file, encoding="UTF-8") as f:
#for line in f:
#(key, val) = line.split()
#children[key.strip("\ufeffn' ").strip("\n ")] = val
with open(children_file, encoding="UTF-8") as ins:
children = [[n.strip("\ufeffn' ").strip("\n ") for n in line.split()] for line in ins]
names = {}
with open(names_file, encoding="UTF-8") as f:
for line in f:
splitLine = line.split()
names[splitLine[0].strip("\ufeffn' ").strip("\n ")] = " ".join(splitLine[1:])
names.items()
for lst in children:
for ind, item in enumerate(lst):
if item in names:
lst[ind] = names[item]
d = {}
for i in range(len(children[0][:])):
if children[0][i] not in d:
d[children[0][i]] = set()
d[children[0][i]].add(children[1][i])
return d
print(connect("children.txt","names.txt"))
答案 0 :(得分:1)
您的代码总体上效率低下。不要制作儿童列表,直接制作地图 。您可以利用字典setdefault
method, or, you could use a collections.defaultdict
,但为了简单起见,我将使用前者。所以,简单地说:
>>> with io.StringIO(children_str) as cf, io.StringIO(names_str) as nf:
... parentmap = {}
... namemap = {}
... for line in cf:
... pid, cid = line.strip().split()
... parentmap.setdefault(pid, set()).add(cid)
... for line in nf:
... nid, name = line.strip().split(maxsplit=1)
... namemap[nid] = name
...
>>> from pprint import pprint
>>> pprint(parentmap)
{'34105139833': {'36512129874'},
'36512129874': {'38504014543', '48708252344'},
'46906183451': {'38504014543', '48708252344'},
'47853062345': {'60907062342'}}
>>> pprint(namemap)
{'34105139833': 'Karl Peedumets',
'36512129874': 'Peeter Peedumets',
'38504014543': 'Maria Peedumets',
'46906183451': 'Madli Peedumets',
'47853062345': 'Kadri Kalkun',
'48708252344': 'Robert Peedumets',
'60907062342': 'Liisa Maria Jaaniste'}
注意,我正在使用io.StringIO
假装我正在使用文件,而是使用我直接从问题中复制的字符串。但是io.StringIO
让你把字符串视为一个文件,但你只需要像往常一样打开你的文件。另请注意,当我从maxsplit
拆分行时,我使用了names.txt
参数,因此名称本身不会被拆分。
要获得最终结果,只需使用:
>>> final = {namemap[k]:{namemap[n] for n in v} for k,v in parentmap.items()}
>>> pprint(final)
{'Kadri Kalkun': {'Liisa Maria Jaaniste'},
'Karl Peedumets': {'Peeter Peedumets'},
'Madli Peedumets': {'Robert Peedumets', 'Maria Peedumets'},
'Peeter Peedumets': {'Robert Peedumets', 'Maria Peedumets'}}