Question

我的电子商务数据中包含键/值对对，例如：

row1: "ideal for":"women", "color":"blue"
row2: "ideal for": "women", "color":"red"
row3: "ideal for": "men", "color":"blue"

我需要创建一个新字典，其中将包含相关值的键和数组，例如：

{"ideal for": ["women","men"], "color": ["red", "blue"]}

当我尝试将值附加到新字典中的键上时，我似乎想不出要这样做的方式，以使值不会重复。

df.apply(lambda row: prep_text(row['product_specifications']), axis=1)
tag_info = df['product_specifications']
tag_info.replace('', np.nan, inplace=True)
tag_info.dropna(inplace=True)
tags_dict = dict()
for row in tag_info:
     for key, value in row.items():
         if key not in tags_dict:
             tags_dict[key] = [value]
         elif value not in tags_dict.values():
             tags_dict[key].append(value)

现在，我得到了一个新的字典，如下所示：

{"ideal for": ["women","women","men"], "color":["blue", "red", "blue"]}

我该怎么做才能使值不重复？

Answer 1

tags_dict.values（）的元素是字符串的列表，而不是字符串。您应该检查

 elif value not in tags_dict[key]:
     tags_dict[key].append(value)

或者您可以使用 Set 代替 List 作为tag_dict的值。 Set 只能包含每个值的一个副本，因此，如果添加具有相同值的第二个副本，它将被忽略。但是 Set 中的值是无序的。

  if key not in tags_dict:
      tags_dict[key] = {value}
  else:
      tags_dict[key].add(value)

如何基于具有重复键值对的数据帧行将值附加到python字典中的键

1 个答案: