python中有多少用户具有相同的字符串值出现

时间:2019-08-06 08:53:45

标签: python pandas

我想知道有多少用户具有相同的字符串值 数据为python数据帧类型,值的顺序无关紧要,应该计数一次(x-y与y-x相同)

user_id    value
1            x
1            y   
2            x
2            y
2            z   
3            x
3            z

Combination   #of user
x-y             2
x-z             2
y-z             1

1 个答案:

答案 0 :(得分:2)

按组创建组合,然后将值chain.from_iterable展平,然后将Counter计数:

from  itertools import combinations, chain
from collections import Counter

s = df.groupby('user_id')['value'].apply(lambda x: list(map( '-'.join, combinations(x, 2))))
#if necessary sorted combinations
#s = (df.groupby('user_id')['value']
#       .apply(lambda x: ['-'.join(sorted(y)) for y in combinations(x, 2)]))

d = Counter(chain.from_iterable(s))

df = pd.DataFrame({'Combination': list(d.keys()),
                   'user':list(d.values())})
print (df)
  Combination  user
0         x-y     2
1         x-z     2
2         y-z     1