我想知道有多少用户具有相同的字符串值 数据为python数据帧类型,值的顺序无关紧要,应该计数一次(x-y与y-x相同)
user_id value
1 x
1 y
2 x
2 y
2 z
3 x
3 z
Combination #of user
x-y 2
x-z 2
y-z 1
答案 0 :(得分:2)
按组创建组合,然后将值chain.from_iterable
展平,然后将Counter
计数:
from itertools import combinations, chain
from collections import Counter
s = df.groupby('user_id')['value'].apply(lambda x: list(map( '-'.join, combinations(x, 2))))
#if necessary sorted combinations
#s = (df.groupby('user_id')['value']
# .apply(lambda x: ['-'.join(sorted(y)) for y in combinations(x, 2)]))
d = Counter(chain.from_iterable(s))
df = pd.DataFrame({'Combination': list(d.keys()),
'user':list(d.values())})
print (df)
Combination user
0 x-y 2
1 x-z 2
2 y-z 1