假设我有一个具有以下值的数据框:
id product1sold product2sold product3sold
1 2 3 3
2 0 0 5
3 3 2 1
如何在每个ID的列表中添加一个“ most_sold”和“ least_sold”列,其中包含所有销量最高和销量最低的商品? 看起来应该像这样。
id product1 product2 product3 most_sold least_sold
1 2 3 3 [product2, product3] [product1]
2 0 0 5 [product3] [product1, product2]
3 3 2 1 [product1] [product3]
答案 0 :(得分:2)
对产品列表使用具有最小和最大值测试的列表理解:
#select all columns without first
df1 = df.iloc[:, 1:]
cols = df1.columns.to_numpy()
df['most_sold'] = [cols[x].tolist() for x in df1.eq(df1.max(axis=1), axis=0).to_numpy()]
df['least_sold'] = [cols[x].tolist() for x in df1.eq(df1.min(axis=1), axis=0).to_numpy()]
print (df)
id product1sold product2sold product3sold most_sold \
0 1 2 3 3 [product2sold, product3sold]
1 2 0 0 5 [product3sold]
2 3 3 2 1 [product1sold]
least_sold
0 [product1sold]
1 [product1sold, product2sold]
2 [product3sold]
如果性能不重要,可以使用DataFrame.apply
:
df1 = df.iloc[:, 1:]
f = lambda x: x.index[x].tolist()
df['most_sold'] = df1.eq(df1.max(axis=1), axis=0).apply(f, axis=1)
df['least_sold'] = df1.eq(df1.min(axis=1), axis=0).apply(f, axis=1)
答案 1 :(得分:-1)
您可以执行以下操作。
minValueCol = yourDataFrame.idxmin(axis=1)
maxValueCol = yourDataFrame.idxmax(axis=1)