Pandas的新手,我想知道是否有更好的方法来完成以下任务 -
设置:
import pandas as pd
import numpy as np
x = np.arange(0, 1, .01)
y = np.random.binomial(10, x, 100)
bins = 50
df = pd.DataFrame({'x':x, 'y':y})
print(df.head())
x y
0 -1 1
1 38 1
2 56 0
3 42 0
4 41 0
我想将x值分组为相等大小的bin,并且对于每个bin,取x和y的平均值。
my_bins = pd.cut(x, bins=20)
data = df[['x', 'y']].groupby(my_bins).agg(['mean', 'size'])
print(data.head())
x y
mean size mean size
age
(-1.101, 4.05] -1.000000 87990 0.768428 87990
(4.05, 9.1] NaN 0 NaN 0
(9.1, 14.15] NaN 0 NaN 0
(14.15, 19.2] 18.512286 1872 0.493590 1872
(19.2, 24.25] 22.768022 8906 0.496968 8906
那很有效。但是从这里开始,我如何绘制x的平均值与y的平均值?我知道我可以做点什么
data.columns = data.columns.droplevel() # remove the multiple levels that were created
data.columns = ['x_mean', 'x_size', 'y_mean', 'y_size'] # manually set new column names
data.plot.scatter(x='x_mean', y='y_mean') # plot
但由于我必须删除列级别(从我的数据中删除有用的结构),这感觉错误和笨重,我必须手动重命名列。还有更好的方法吗?
答案 0 :(得分:2)
您可以使用元组指定指向多级列的x和y参数:
private List<string> mMembershipIds = new List<string>();
public List<string> MembershipIds
{
get
{
return mMembershipIds;
}
set
{
mMembershipIds = value;
}
}
这样,您无需重命名列以进行绘制。