我的DataFrame
看起来像这样:
class passed failed extra_teaching
A11 1 2 0.5
A12 2 1 0.7
我想解开' DataFrame
,并且丢失了关于班级的信息,但保留了extra_teaching
的信息,所以我最后为每个通过的学生排了一行。
所以DataFrame
应该看起来像这样:
pass extra_teaching
1 0.5
0 0.5
0 0.5
1 0.7
1 0.7
0 0.7
我不知道如何在pandas
中执行此操作,除非使用iterrows()
并手动将行添加到新的DataFrame
- 有人有更简洁的方法吗?
更新:
我试过这个,似乎工作虽然不是很优雅:
temp = []
df = df.set_index('class')
for idx in df.index:
row = df.loc[idx]
t = {'class': idx, 'extra_teaching': row['extra_teaching']}
for i in range(0, int(row['passed'])):
t['pass'] = 1
temp.append(t)
for i in range(0, int(row['failed'])):
t['pass'] = 0
temp.append(t)
df_exploded = pd.DataFrame(temp)
答案 0 :(得分:1)
尝试:
def teaching_results(x):
num_rows = x.passed.iloc[0] + x.failed.iloc[0]
passed = x.passed.iloc[0] * [1] + x.failed.iloc[0] * [0]
extra_teaching = num_rows * [x.extra_teaching.iloc[0]]
class_code = x['class'].iloc[0]
return pd.DataFrame({'pass': passed, 'extra_teaching': extra_teaching, 'class': class_code})
df.groupby('class', as_index=False).apply(lambda x: teaching_results(x))
得到:
class extra_teaching pass
0 0 A11 0.5 1
1 A11 0.5 0
2 A11 0.5 0
1 0 A12 0.7 1
1 A12 0.7 1
2 A12 0.7 0