我有一个包含三列的数据框,分别是年,产品,价格。我想计算每年从价格中排除零的最小值。还希望将“产品”列中的相邻值填充到最小值。
数据:
Year Product Price
2000 Grapes 0
2000 Apple 220
2000 pear 185
2000 Watermelon 172
2001 Orange 0
2001 Muskmelon 90
2001 Pear 165
2001 Watermelon 99
在新数据帧中理想的输出:
Year Minimum Price Product
2000 172 Watermelon
2001 90 Muskmelon
答案 0 :(得分:1)
首先用boolean indexing
过滤掉0
行:
df1 = df[df['Price'] != 0]
然后使用DataFrameGroupBy.idxmin
来为每个组使用最少Price
的索引,并按loc
进行选择:
df2 = df1.loc[df1.groupby('Year')['Price'].idxmin()]
将sort_values
与drop_duplicates
结合使用:
df2 = df1.sort_values(['Year', 'Price']).drop_duplicates('Year')
print (df2)
Year Product Price
3 2000 Watermelon 172
5 2001 Muskmelon 90
如果可能的话,需要多个最小值,并且每组需要所有最小值:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 172
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2001 Pear 165
7 2001 Watermelon 99
df1 = df[df['Price'] != 0]
df = df1[df1['Price'].eq(df1.groupby('Year')['Price'].transform('min'))]
print (df)
Year Product Price
2 2000 pear 172
3 2000 Watermelon 172
5 2001 Muskmelon 90
编辑:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 185
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2002 Pear 0
7 2002 Watermelon 0
df['Price'] = df['Price'].replace(0, np.nan)
df2 = df.sort_values(['Year', 'Price']).drop_duplicates('Year')
df2['Product'] = df2['Product'].mask(df2['Price'].isnull(), 'No data')
print (df2)
Year Product Price
3 2000 Watermelon 172.0
5 2001 Muskmelon 90.0
6 2002 No data NaN