我有一个代码,可以将以下文件中的这些行格式化为字典:
CommonChar pins Category General
CommonChar pins Contact Mark
CommonChar pins Description This is a
CommonChar nails Category specific
CommonChar nails Contact John
CommonChar pins Description This is a description
最终的dict看起来像这样:
main_dict = {
'pins':
{
'Category': ['General'],
'Contact': ['Mark'],
'Description': ['This', 'is', 'a']
},
'nails':
{
'Category': ['specific'],
'Contact': ['Jon'],
'Description': ['This', 'is', 'a', 'description']
}}
我今天有这个代码来创建上面的词典:
filePath= os.path.join(dirName,eachFile)
fh=open(filePath, "r")
contents=fh.read()
items=re.findall("CommonChar.*$",contents,re.MULTILINE)
for x in items:
parts=x.split()
if parts[1] in mainDict:
if parts[2] in mainDict[parts[1]]:
sys.exit("exit")
else:
mainDict[parts[1]].update({parts[2]:parts[3:]})
else:
mainDict[parts[1]]={}
mainDict[parts[1]].update({parts[2]:parts[3:]})
如果我对代码的输入改变如下,那么我需要附加类似键的值:
CommonChar pins Category General
CommonChar pins Contact Mark
CommonChar pins Description This is a
CommonChar pins Description secondLine
CommonChar nails Category specific
CommonChar nails Contact John
CommonChar pins Description This is a description
我需要从上面的行中得到如下输出:
</br>
也被添加到它。
main_dict = {
'pins':
{
'Category': ['General'],
'Contact': ['Mark'],
'Description': ['This', 'is', 'a','</br>','secondLine']
},
'nails':
{
'Category': ['specific'],
'Contact': ['Jon'],
'Description': ['This', 'is', 'a', 'description']
}}
为此,我正在替换这一行:
sys.exit("exit")
用这个:
mainDict[parts[1]][parts[2]].append(parts[3:])
但我得到这样的输出:
main_dict = {
'pins':
{
'Category': ['General'],
'Contact': ['Mark'],
'Description': ['This', 'is', 'a',['secondLine']]
},
'nails':
{
'Category': ['specific'],
'Contact': ['Jon'],
'Description': ['This', 'is', 'a', 'description']
}}
那么如何避免将那些额外的[]添加到secondLine并添加
</br>
面前呢?
答案 0 :(得分:0)
这个怎么样?我对您的代码进行了一些更改:
with
子句中打开并阅读该文件,以确保在完成时关闭该文件。main_dict = {}
与mainDict={}
)。setdefault
确保main_dict
在分配给每个组之前有一个空dict(这通常不如重复且更容易维护,而不是如果它没有&那么创建一个条目的单独分支如果存在,则存在或更新。).extend
代替.append
将更多项添加到有效内容列表中。/tmp/data1.txt
CommonChar pins Category General
CommonChar pins Contact Mark
CommonChar pins Description This is a
CommonChar pins Description secondLine
/tmp/data2.txt
CommonChar nails Category specific
CommonChar nails Contact John
CommonChar nails Description This is a description
script.py(更新为扫描files
变量中列出的多个文件)
import re, pprint
files = ['/tmp/data1.txt', '/tmp/data2.txt']
main_dict = {}
for filename in files:
with open(filename, "r") as fh:
contents = fh.read()
items = re.findall("CommonChar.*$", contents, re.MULTILINE)
for x in items:
cc, group, topic, data = x.split(None, 3)
data = data.split()
group_dict = main_dict.setdefault(group, {'fileLocation': [filename]})
if topic in group_dict:
group_dict[topic].extend(['</br>'] + data)
else:
group_dict[topic] = data
pprint.pprint(main_dict)
输出
{'nails': {'Category': ['specific'],
'Contact': ['John'],
'Description': ['This', 'is', 'a', 'description'],
'fileLocation': ['/tmp/data.txt']},
'pins': {'Category': ['General'],
'Contact': ['Mark'],
'Description': ['This', 'is', 'a', '</br>', 'secondLine'],
'fileLocation': ['/tmp/data.txt']}}
顺便说一句,如果您的代码遇到错误,则应该引发异常,而不是调用sys.exit()
,例如raise ValueError("Bad value in file")
。
您在评论中询问上述代码是否与您的代码相同,如下所示:
if parts[1] in mainDict:
if parts[2] in mainDict[parts[1]]:
mainDict[parts[1]][parts[2]].extend(['</br>']+parts[3:])
else:
mainDict[parts[1]].update({parts[2]:parts[3:]})
else:
mainDict[parts[1]]={}
mainDict[parts[1]].update({parts[2]:parts[3:]})
这是,而且这是如何运作的细分。首先,行cc, group, topic, data = x.split(None, 3)
使group
等同于parts[1]
,topic
等同于parts[2]
。然后下一行使data
等同于parts[3:]
。进行这些替换会产生以下结果:
if group in mainDict:
if topic in mainDict[group]:
mainDict[group][topic].extend(['</br>'] + data)
else:
mainDict[group].update({topic: data]})
else:
mainDict[group]={}
mainDict[group].update({topic: data]})
接下来,我们注意到上面的代码相当于:
# next line sets groupDict=mainDict[group], creating it if needed
groupDict = mainDict.setdefault(group, {})
if topic in groupDict:
groupDict[topic].extend(['</br>'] + data)
else:
groupDict.update({topic: data]})
最后,我们注意到,在第一次创建groupDict时可以存储文件名,而groupDict.update({topic: data]})
相当于groupDict[topic] = data
。通过这些更改,我们得到:
groupDict = mainDict.setdefault(group, {'fileLocation': [filename]})
if topic in groupDict:
groupDict[topic].extend(['</br>'] + data)
else:
groupDict[topic] = data
除了拼写之外,这与我解决方案的最后五行相同:
group_dict = main_dict.setdefault(group, {'fileLocation': [filename]})
if topic in group_dict:
group_dict[topic].extend(['</br>'] + data)
else:
group_dict[topic] = data