下面有一个表格,我需要根据ID创建一列“相关”和“不相关”。
该表如下所示:
+----+--------------+--------+
| ID | Experience | Length |
+----+--------------+--------+
| 1 | Relevant | 2 |
| 1 | Non-Relevant | 1 |
| 4 | Relevant | 3 |
| 4 | Relevant | 4 |
| 4 | Non-Relevant | 0 |
| 5 | Relevant | 1 |
| 5 | Relevant | 1 |
+----+--------------+--------+
这是我想要获得的输出
+----+----------+--------------+
| ID | Relevant | Non-Relevant |
+----+----------+--------------+
| 1 | 2 | 1 |
| 4 | 7 | 0 |
| 5 | 2 | 0 |
+----+----------+--------------+
答案 0 :(得分:1)
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 4, 4, 4, 5, 5], 'exp': [x for x in 'rnrrnrr'], 'len':[2, 1, 3, 4, 0, 1, 1]})
pd.pivot_table(df, index='id', values='len', columns='exp', aggfunc='sum', fill_value=0)
文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html
答案 1 :(得分:1)
要创建数据框:
ID = [1,1,4,4,4,5,5]
Experience = ['Relevant', 'Non-Relevant', 'Relevant', 'Relevant', 'Non-Relevant',
'Relevant', 'Relevant']
length = [2,1,3,4,0,1,1]
dictionary = {'ID' : ID,
'Experience' : Experience,
'Length' : length}
将其分组然后再堆叠:
df.groupby(by=['ID','Experience']).sum().unstack()['Length'].fillna(0)