Question

我的DataFrame看起来像这样：

    v1    v2    v3
    a    b     a,b
    b    a     b,a
    c    a     c,a

我正在尝试遍历v3列以创建一个计算唯一字符串组合的字典。在循环中，我需要检查现有组合和反向组合以将它们计为相同（即a,b与b,a相同）。

以下是我的代码：

import pandas as pd
df = pd.read_excel("filename.xlsx")

combine_count = {}
col = df['v3']
for entry in col:
    if entry in combine_count.keys():
        combine_count[entry] += 1
    elif entry not in combine_count.keys():
        reverse = ','.join(entry.split(',')[::-1])
        if reverse in combine_count.keys():
            combine_count[entry] += 1
    else:
        combine_count[entry] = 1

print(combine_count)之后的输出是空字典{}。如何收集正确的密钥和值？

Answer 1

为什么不使用collections.Counter和frozenset：

>>> from collections import Counter
>>> cnts = Counter(frozenset(item.split(',')) for item in df['v3'])
>>> cnts
Counter({frozenset({'a', 'b'}): 2, frozenset({'a', 'c'}): 1})

Counter可以像任何字典一样使用，frozenset注意订单无关紧要，只需要内容。

您还可以使用字符串键将其转换回普通字典：

>>> {','.join(sorted(key)): count for key, count in cnts.items()}
{'a,b': 2, 'a,c': 1}

Answer 2

要在代码中修复两个逻辑问题。 1）最后一个else语句没有正确缩进，在原始代码中它永远不会被执行，因为if和elif涵盖了所有可能的情况（一个密钥可以在字典与否）; 2）如果reverse中的combine_count.keys()，您应该向reverse添加一个，因为entry不在字典中，而是reverse。

combine_count = {}
col = df['v3']
for entry in col:
    if entry in combine_count.keys():
        combine_count[entry] += 1

    elif entry not in combine_count.keys():
        reverse = ','.join(entry.split(',')[::-1])

        if reverse in combine_count.keys():
            combine_count[reverse] += 1          # entry to reverse

        else:                                    # indentation here
            combine_count[entry] = 1

dict(combine_count)
# {'a,b': 2, 'c,a': 1}

此外，由于您使用的是pandas，因此这是另一种pandas/numpy方法：

import numpy as np
import pandas as pd

# here use maximum and minimum to sort your key before doing any count
(np.minimum(df.v1, df.v2) + "," + np.maximum(df.v1, df.v2)).value_counts().to_dict()

# {'a,b': 2, 'a,c': 1}

Answer 3

我相信，你所寻找的是：

import pandas as pd
df = pd.read_excel("filename.xlsx")

combine_count = {}
col = df['v3']
for entry in col:
    if col[entry] in combine_count.keys():
        combine_count[col[entry]] += 1
    elif col[entry] not in combine_count.keys():
        reverse = ','.join(col[entry].split(',')[::-1])
        if reverse in combine_count.keys():
            combine_count[reverse] += 1
        else:        
            combine_count[col[entry]] = 1

您要做的是检查是否存在value，例如a,b，而不是key。出于比较的目的，key是无关紧要的，至少就我理解你的意图而言。因此，您需要检查entry。

，而不是检查col[entry]

解决这个问题，并且使用适当的缩进代码，如我在此处所示，你应该很高兴。这将返回每个值，包括任何镜像，以及每个值的计数，作为字典。

要收集密钥，您可以使用此字典中的值来创建一个列表，例如，col中与该值关联的所有密钥。

循环

3 个答案: