有效识别字符串的一部分是否在list / dict键中?

时间:2018-03-21 14:26:30

标签: python string list match

我在列表中有很多(> 100,000)小写字符串,其子集可能如下所示:

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

我还有一个像这样的字典(实际上它的长度约为1000左右):

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

对于列表中包含任何dict键的所有字符串,我想用相应的dict值替换整个字符串。因此,预期结果应为:

str_list = ["dk", "us", "nothing here"]

考虑到我拥有的字符串数量和字典长度,最有效的方法是什么?

额外信息:字符串中永远不会有多个dict键。

5 个答案:

答案 0 :(得分:3)

这似乎是一个好方法:

input_strings = ["hello i am from denmark",
                 "that was in the united states",
                 "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

output_strings = []

for string in input_strings:
    for key, value in dict_x.items():
        if key in string:
            output_strings.append(value)
            break
    else:
        output_strings.append(string)
print(output_strings)

答案 1 :(得分:1)

这样的事情会起作用。请注意,这会将字符串转换为符合条件的第一个遇到的键。如果有多个,您可能希望根据您的用例适合修改逻辑。

strings = [str1, str2, str3]
converted = []
for string in strings:
    updated_string = string
    for key, value in dict_x.items()
        if key in string:
            updated_string = value
            break
    converted.append(updated_string)
print(converted)

答案 2 :(得分:1)

尝试

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

for k, v in dict_x.items():
    for i in range(len(str_list)):
        if k in str_list[i]:
            str_list[i] = v

print(str_list)

这会迭代字典中的键值对,并查看键是否在字符串中。如果是,则用值替换字符串。

答案 3 :(得分:1)

假设:

lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

你可以这样做:

res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]

返回:

print(res)  # -> ['dk', 'us', 'nothing here']

这个很酷的事情(除了它是一个蟒蛇 - 忍者最喜欢的武器,又名 list-comprehension )是get,默认为my_str和{{ 1}} nextStopIteration触发上述默认值。

答案 4 :(得分:1)

您可以继承dict并使用列表理解。

就性能而言,我建议您尝试一些不同的方法,看看什么效果最好。

class dict_contains(dict):
    def __getitem__(self, value):
        key = next((k for k in self.keys() if k in value), None)
        return self.get(key)

str1 = "hello i am from denmark"
str2 = "that was in the united states"
str3 = "nothing here"

lst = [str1, str2, str3]

dict_x = dict_contains({"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"})

res = [dict_x[i] or i for i in lst]

# ['dk', 'us', "nothing here"]