我在列表中有很多(> 100,000)小写字符串,其子集可能如下所示:
str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]
我还有一个像这样的字典(实际上它的长度约为1000左右):
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
对于列表中包含任何dict键的所有字符串,我想用相应的dict值替换整个字符串。因此,预期结果应为:
str_list = ["dk", "us", "nothing here"]
考虑到我拥有的字符串数量和字典长度,最有效的方法是什么?
额外信息:字符串中永远不会有多个dict键。
答案 0 :(得分:3)
这似乎是一个好方法:
input_strings = ["hello i am from denmark",
"that was in the united states",
"nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
output_strings = []
for string in input_strings:
for key, value in dict_x.items():
if key in string:
output_strings.append(value)
break
else:
output_strings.append(string)
print(output_strings)
答案 1 :(得分:1)
这样的事情会起作用。请注意,这会将字符串转换为符合条件的第一个遇到的键。如果有多个,您可能希望根据您的用例适合修改逻辑。
strings = [str1, str2, str3]
converted = []
for string in strings:
updated_string = string
for key, value in dict_x.items()
if key in string:
updated_string = value
break
converted.append(updated_string)
print(converted)
答案 2 :(得分:1)
尝试
str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
for k, v in dict_x.items():
for i in range(len(str_list)):
if k in str_list[i]:
str_list[i] = v
print(str_list)
这会迭代字典中的键值对,并查看键是否在字符串中。如果是,则用值替换字符串。
答案 3 :(得分:1)
假设:
lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
你可以这样做:
res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]
返回:
print(res) # -> ['dk', 'us', 'nothing here']
这个很酷的事情(除了它是一个蟒蛇 - 忍者最喜欢的武器,又名 list-comprehension )是get
,默认为my_str
和{{ 1}} next
值StopIteration
触发上述默认值。
答案 4 :(得分:1)
您可以继承dict
并使用列表理解。
就性能而言,我建议您尝试一些不同的方法,看看什么效果最好。
class dict_contains(dict):
def __getitem__(self, value):
key = next((k for k in self.keys() if k in value), None)
return self.get(key)
str1 = "hello i am from denmark"
str2 = "that was in the united states"
str3 = "nothing here"
lst = [str1, str2, str3]
dict_x = dict_contains({"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"})
res = [dict_x[i] or i for i in lst]
# ['dk', 'us', "nothing here"]