Question

我在列表中有很多（> 100,000）小写字符串，其子集可能如下所示：

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

我还有一个像这样的字典（实际上它的长度约为1000左右）：

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

对于列表中包含任何dict键的所有字符串，我想用相应的dict值替换整个字符串。因此，预期结果应为：

str_list = ["dk", "us", "nothing here"]

考虑到我拥有的字符串数量和字典长度，最有效的方法是什么？

额外信息：字符串中永远不会有多个dict键。

Answer 1

这似乎是一个好方法：

input_strings = ["hello i am from denmark",
                 "that was in the united states",
                 "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

output_strings = []

for string in input_strings:
    for key, value in dict_x.items():
        if key in string:
            output_strings.append(value)
            break
    else:
        output_strings.append(string)
print(output_strings)

Answer 2

这样的事情会起作用。请注意，这会将字符串转换为符合条件的第一个遇到的键。如果有多个，您可能希望根据您的用例适合修改逻辑。

strings = [str1, str2, str3]
converted = []
for string in strings:
    updated_string = string
    for key, value in dict_x.items()
        if key in string:
            updated_string = value
            break
    converted.append(updated_string)
print(converted)

Answer 3

尝试

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

for k, v in dict_x.items():
    for i in range(len(str_list)):
        if k in str_list[i]:
            str_list[i] = v

print(str_list)

这会迭代字典中的键值对，并查看键是否在字符串中。如果是，则用值替换字符串。

Answer 4

假设：

lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

你可以这样做：

res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]

返回：

print(res)  # -> ['dk', 'us', 'nothing here']

这个很酷的事情（除了它是一个蟒蛇 - 忍者最喜欢的武器，又名 list-comprehension ）是get，默认为my_str和{{ 1}} next值StopIteration触发上述默认值。

Answer 5

您可以继承dict并使用列表理解。

就性能而言，我建议您尝试一些不同的方法，看看什么效果最好。

class dict_contains(dict):
    def __getitem__(self, value):
        key = next((k for k in self.keys() if k in value), None)
        return self.get(key)

str1 = "hello i am from denmark"
str2 = "that was in the united states"
str3 = "nothing here"

lst = [str1, str2, str3]

dict_x = dict_contains({"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"})

res = [dict_x[i] or i for i in lst]

# ['dk', 'us', "nothing here"]

有效识别字符串的一部分是否在list / dict键中？

5 个答案: