我有一个按位置聚合人的数据框
location_id | score | number_of_males | number_of_females
1 | 20 | 2 | 1
2 | 45 | 1 | 2
我想创建一个新的数据框,这个数据框没有聚合这个,所以我得到像
这样的东西location_id | score | number_of_males | number_of_females
1 | 20 | 1 | 0
1 | 20 | 1 | 0
1 | 20 | 0 | 1
2 | 45 | 1 | 0
2 | 45 | 0 | 1
2 | 45 | 0 | 0
甚至更好
location_id | score | sex
1 | 20 | male
1 | 20 | male
1 | 20 | female
2 | 45 | male
2 | 45 | female
2 | 45 | female
我想做点什么
import pandas as pd
aggregated_df = pd.DataFrame.from_csv(SOME_PATH)
unaggregated_df = df = pd.DataFrame(columns=['location_id', 'score', 'sex'])
for row in aggregated_df:
for column in ['number_of_males', 'number_of_females']:
for number_of_people in range(0, row[column]):
if column == 'number_of_males':
sex = 'male'
else:
sex = 'female'
unaggregated_df.append([{'location_id': row['location_id'],
'score': row['score'],
'sex': sex}],
ignore_index=True)
即使pandas
支持这似乎得到支持,我也无法将字典附加到其中是否有更多pandthonic(熊猫版本的pythonic)方法来实现这一目标?
答案 0 :(得分:2)
以下是使用group_by
获取结果的方法:
ids = ['location_id','score']
def foo(d):
return pd.Series(d['number_of_males'].values*['male'] +
d['number_of_females'].values*['female'])
pd.melt(df.groupby(ids).apply(foo).reset_index(), id_vars=ids).drop('variable', 1)
#Out[13]:
# location_id score value
#0 1 20 male
#1 2 45 male
#2 1 20 male
#3 2 45 female
#4 1 20 female
#5 2 45 female
答案 1 :(得分:0)
直到这个我可以做一个熊猫功能
print df
location_id score number_of_males number_of_females
1 20 2 1
2 45 1 2
将两列转换为一列,
df.set_index(['location_id','score']).stack().reset_index()
Out[102]:
location_id score level_2 0
0 1 20 number_of_males 2
1 1 20 number_of_females 1
2 2 45 number_of_males 1
3 2 45 number_of_females 2
但是我必须使用python循环迭代来增加行数:(