计算列表中具有相同值的子列表

时间:2021-05-03 18:35:42

标签: python pandas counter

如何计算列表中具有相同值(顺序无关紧要)的子列表?

我试过了:

from collections import Counter

Input = [
    [
        'Test123', 'heyhey123', 'another_unique_value',
    ],
    [
        'Test123', 'heyhey123', 'another_unique_value',
    ],
    [
        'heyhey123',
    ],
    [
        'Test123', 'heyhey123',
    ],
    [
        'another_unique_value', 'heyhey123', 'Test123'
    ]
]

Counter(str(e) for e in li)

Output:

Counter({
    "['Test123', 'heyhey123', 'another_unique_value']": 2},
    "['heyhey123']": 1},
    "['Test123', 'heyhey123']": 1},
    "['another_unique_value', 'heyhey123', 'Test123']": 1},
)

显然,它从帐户列表中的值中获取顺序。我如何计算顺序无关紧要的子列表?

我想要的输出是:

Counter({
    "['Test123', 'heyhey123', 'another_unique_value']": 3},
    "['heyhey123']": 1},
    "['Test123', 'heyhey123']": 1},
)

2 个答案:

答案 0 :(得分:0)

我认为你很接近。

Counter(str(set(e)) for e in Input)
返回

Counter({"{'heyhey123', 'Test123', 'another_unique_value'}": 3,
     "{'heyhey123'}": 1,
     "{'heyhey123', 'Test123'}": 1})

我相信这与您正在寻找的几乎相同:)

答案 1 :(得分:0)

可以替换

Counter(str(e) for e in li)

Counter(tuple(sorted(e)) for e in li)

给出输出:

Counter({('Test123', 'another_unique_value', 'heyhey123'): 3,
         ('heyhey123',): 1,
         ('Test123', 'heyhey123'): 1})

另一种选择是使用 set(e) 来忽略列表中元素的顺序,但这有忽略重复的缺点 - ['Test123', 'heyhey123', 'another_unique_value'] 将被视为与 ['Test123', 'heyhey123', 'another_unique_value', 'another_unique_value'] 相同- 此外,当从不可散列的 set 转换为包含在 Counter 中时,不能保证相同的顺序。