我正在尝试确保预期的子字符串列表出现在字符串列表中。我需要知道是否缺少一个人,以便填充它。我需要在字符串列表中找到子字符串列表的索引,以便可以将字符串的值拉到旁边。 (使用Python 3。)
# List of strings parsed from a document
strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
'555-555-5550']]
# Expected/desired headings
subs = ['name', 'email', 'phone']
然后检查是否捕获了所有“潜艇”。如果没有,找到哪个并用nan填写。
预期结果:
{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-
5550'}
答案 0 :(得分:1)
这个问题似乎与如何将解决问题所需的逻辑步骤转换为代码有关。甚至在开始使用Python之前,最好先考虑一下伪代码,以清楚地了解所需的逻辑步骤。
for each row of data:
* initialize a new output data structure for this row
for each required key:
if the key is in the row:
* find the indices associated with the key/value pair
* store key/value pair in the output data
otherwise (i.e. if the key is not in the row):
* store key/None pair in the output data
您几乎可以直接将此伪代码转换为有效的Python代码。这是一种非常明确的方法,在逻辑的每个步骤中都使用循环和变量声明,这是一个很好的学习方法。稍后,您可能需要针对性能和/或样式进行优化。
# List of strings parsed from a document
strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
'555-555-5550']]
# Expected/desired headings
subs = ['name', 'email', 'phone']
# Create dictionaries for each row
results = []
for row in strings:
d = {}
for key in subs:
if key in row:
key_idx = row.index(key)
val_idx = key_idx + 1
val = row[val_idx]
else:
val = None
d[key] = val
results.append(d)
print(results)
结果:
[{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': None},
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-5550'}]
答案 1 :(得分:0)
# List of strings parsed from a document
strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
'555-555-5550']]
# Expected/desired headings
subs = ['name', 'email', 'phone']
为此,我将使用列表理解来选择字典输出。
for row in strings:
# Get key:value of each sub in row
foundSubs = dict((s,row[i+1]) for (i,s) in enumerate([n.lower() for n
in row]) for sub in subs if sub in s)
# check for all subs in result: name, email, phone
# if one missing, fill in nan
for eachSub in subs:
if [i for i in foundSubs if eachSub in i] == []:
foundSubs[eachSub] = np.nan
print (foundSubs)
结果:
{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-
5550'}
可以通过不使用列表理解中的“ dict”将其设置为列表元组格式:
[('name', 'Joe Sixpack'), ('email', 'beerme@thebrew.com'), ('phone', nan)]
[('name', 'Winnie Cooler'), ('email', 'Winnie Cooler'), ('phone', '555-555-
5550')]
答案 2 :(得分:0)
我们将列表转换为集合并找到缺少的值: 如果找到一个,我们会将缺少的值和NONE附加到列表
# List of strings parsed from a document
data = [['name', 'Joe Sixpack','email', 'Winnie Cooler'],
['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
'555-555-5550']]
# Expected/desired headings
subs = set(['name', 'email', 'phone'])
for node in data:
missingValue = subs.difference(set(node))
if missingValue:
for value in missingValue:
node.append(value)
node.append(None)
print(node)
输出
['name', 'Joe Sixpack', 'email', 'Winnie Cooler', 'phone', None]
['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone', '555-555-5550']
答案 3 :(得分:0)
一个内胆:
>>> strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
... ['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
... '555-555-5550']]
>>> subs = ['name', 'email', 'phone']
>>> [{**{k: None for k in subs}, **dict(zip(s[::2], s[1::2]))} for s in strings]
[{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': None}, {'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-5550'}]
注意:对于电话号码,None
比nan
好。
列表理解的核心是:dict(zip(s[::2], s[1::2]))
:s[::2]
创建s
偶数元素的列表,而s[1::2]
创建奇数元素的列表。两者都以可迭代的(odd, even), (odd, even), ...
压缩,即第一个字符串为('name', 'Joe Sixpack'), ('email', 'beerme@thebrew.com')
。它们用dict
包裹在字典中。
现在是默认值。 {k: None for k in subs}
是字典{'name': None, 'email': None, 'phone': None}
。两个字典都合并在一起(请参阅How to merge two dictionaries in a single expression?)-重复键的值取自第一个,然后瞧。