我有两个数据框
df1
+----+-------+
| | Key |
|----+-------|
| 0 | 30 |
| 1 | 31 |
| 2 | 32 |
| 3 | 33 |
| 4 | 34 |
| 5 | 35 |
+----+-------+
df2
+----+-------+--------+
| | Key | Test |
|----+-------+--------|
| 0 | 30 | Test4 |
| 1 | 30 | Test5 |
| 2 | 30 | Test6 |
| 3 | 31 | Test4 |
| 4 | 31 | Test5 |
| 5 | 31 | Test6 |
| 6 | 32 | Test3 |
| 7 | 33 | Test3 |
| 8 | 33 | Test3 |
| 9 | 34 | Test1 |
| 10 | 34 | Test1 |
| 11 | 34 | Test2 |
| 12 | 34 | Test3 |
| 13 | 34 | Test3 |
| 14 | 34 | Test3 |
| 15 | 35 | Test3 |
| 16 | 35 | Test3 |
| 17 | 35 | Test3 |
| 18 | 35 | Test3 |
| 19 | 35 | Test3 |
+----+-------+--------+
我想计算每个Test
列出每个Key
的次数。
+----+-------+-------+-------+-------+-------+-------+-------+
| | Key | Test1 | Test2 | Test3 | Test4 | Test5 | Test6 |
|----+-------|-------|-------|-------|-------|-------|-------|
| 0 | 30 | | | | 1 | 1 | 1 |
| 1 | 31 | | | | 1 | 1 | 1 |
| 2 | 32 | | | 1 | | | |
| 3 | 33 | | | 2 | | | |
| 4 | 34 | 2 | 1 | 3 | | | |
| 5 | 35 | | | 5 | | | |
+----+-------+-------+-------+-------+-------+-------+-------+
我尝试过的事情
使用join和groupby,我首先获得了每个Key
的计数,而与Test
无关。
result_df = df1.join(df2.groupby('Key').size().rename('Count'), on='Key')
+----+-------+---------+
| | Key | Count |
|----+-------+---------|
| 0 | 30 | 3 |
| 1 | 31 | 3 |
| 2 | 32 | 1 |
| 3 | 33 | 2 |
| 4 | 34 | 6 |
| 5 | 35 | 5 |
+----+-------+---------+
我尝试将Key
与Test
分组
result_df = df1.join(df2.groupby(['Key', 'Test']).size().rename('Count'), on='Key')
但这会返回错误
ValueError: len(left_on) must equal the number of levels in the index of "right"
答案 0 :(得分:3)
使用crosstab
pd.crosstab(df2.Key,df2.Test).reindex(df1.Key).replace({0:''})
答案 1 :(得分:0)
这是groupby和pivot的另一种解决方案。使用此解决方案,您根本不需要df1。
# | create some dummy data
tests = ['Test' + str(i) for i in range(1,7)]
df = pd.DataFrame({'Test': np.random.choice(tests, size=100), 'Key': np.random.randint(30, 35, size=100)})
df['Count Variable'] = 1
# | group & count aggregation
df = df.groupby(['Key', 'Test']).count()
df = df.pivot(index="Key", columns="Test", values="Count Variable").reset_index()