我尝试从大数据框中提取单个ID,将价格范围计数和计算平均值进行装箱。无法获得从new_df获取价格范围以计算垃圾箱均值的方法,甚至试图拆分并堆叠价格范围,但仍无法访问价格范围。下面是我的代码。有人可以建议吗?
Sample data frame
Id price price_range
11111333 30.0 (0.0, 50.0]
11111333 34.0 (0.0, 50.0]
11111333 80.0 (50.0, 100.0]
11111333 25.0 (0.0, 50.0]
11111333 13.0 (0.0, 50.0]
11111333 17.0 (0.0, 50.0]
11111333 42.0 (0.0, 50.0]
11111333 20.0 (0.0, 50.0]
11111333 210.0 (200.0, 250.0]
22222111 30.0 (0.0, 50.0]
22222111 134.0 (100.0, 150.0]
22222111 1080.0 (1050.0, 1100.0]
22222111 25.0 (0.0, 50.0]
22222111 413.0 (400.0, 450.0]
22222111 117.0 (100.0, 150.0]
22222111 12.0 (0.0, 50.0]
22222111 60.0 (50.0, 100.0]
22222111 110.0 (100.0, 150.0]
#generate bin range
x_range=np.arange(0,df["Volume"].max()+50,50)
#add new column price_range with values
df["price_range"]=pd.cut(df["Volume"],bins=x_range)
#get value counts of price
new_df["range_cnt"]=pd.DataFrame(df["price_range"].value_counts())
new_df
range_cnt
(0.0, 50.0] 7
(50.0, 100.0] 1
(200.0, 250.0] 1
#split price range_cnt
out=new_df["range_cnt"].str.split(',\s+', expand=True).stack()
(0.0, 50.0] 0 7
(50.0, 100.0] 0 1
(200.0, 250.0] 0 1
dtype: object
#When i try to access first row,could get only 7,instead of (0.0, 50.0]
out[1]
0 7
dtype: object
Below is the expected format
Id price_range count mean
11111333 (0.0, 50.0] 7 25
(50.0, 100.0] 1 75
(200.0, 250.0] 1 225
22222111 (0.0, 50.0] 3 25
(50.0, 100.0] 1 75
(100.0, 150.0] 3 125
(400.0, 450.0] 1 425
(1050.0, 1100.0] 1 1075
答案 0 :(得分:1)
这是一种方法
new_df['mean']=new_df.index.map(lambda x : (x.left+x.right)/2)
new_df
Out[121]:
price_range mean
(100, 150] 2 125.0
(150, 200] 1 175.0
(50, 100] 1 75.0
(0, 50] 0 25.0