Question

我有一个csv文件，其中包含“ ID”，“类别”，“单价”和“数量”之类的信息。

ID Category  Unit_Price  Qty
1    Apple       5        4
2    Grape       8        6
3    Apple       5        2
4   Orange       6        7
5     Pear       4        4

我需要的是（1）在不使用pandas或numpy的情况下以最高价格（Unit_Price * Qty）返回“ ID”；（2）返回最高价格的“类别”。

我尝试通过使用像这样的熊猫来做这两者

# (1) return ID with the maximum price
myindex = (df['Unit_Price']*df['Qty']).idxmax(axis=1)
df['ID'][myindex]

# (2) return Category the with maximum price
df2 = df.reset_index().groupby(['Category'])
df2.sum().sort_values(by='Amount',ascending=False)[:1].index

任何指针或提示吗？还有其他更有效的方法吗？

Answer 1

您可以使用for循环或列表理解来创建具有元组(Unit_Price*Qty, ID, Category)的列表，然后可以使用max()

Unit_Price*Qty必须是元组中的第一位，才能获得该值的最大值。

data = '''1    Apple       5        4
2    Grape       8        6
3    Apple       5        2
4   Orange       6        7
5     Pear       4        4'''

data = [[item for item in row.split(' ') if item] for row in data.split('\n') ]

# ---------

val, idx, cat = max((int(row[2])*int(row[3]), row[0], row[1]) for row in data)

print(idx, cat, val)

结果：

2 Grape 48

但是它只有一个元素。如果还有更多具有相同最大值的项目，那么它将不会有用。您将需要字典来记住所有使用价格作为键的值。

我创建了两个价格最高的元素-索引2和6

data = '''1    Apple       5        4
2    Grape       8        6
3    Apple       5        2
4   Orange       6        7
5     Pear       4        4
6    Grape       8        6'''

data = [[item for item in row.split(' ') if item] for row in data.split('\n') ]

# ---------

results = dict()

for row in data:
    val = int(row[2])*int(row[3])
    idx = row[0]
    cat = row[1]
    if val not in results:
        results[val] = []
    results[val].append( (idx, cat) )

max_val = max(results.keys())

print(max_val, results[max_val])

结果：

48 [('2', 'Grape'), ('6', 'Grape')]

编辑：与pandas相同，只需两行

data = '''1    Apple       5        4
2    Grape       8        6
3    Apple       5        2
4   Orange       6        7
5     Pear       4        4
6    Grape       8        6'''

data = [[item for item in row.split(' ') if item] for row in data.split('\n') ]

import pandas as pd

df = pd.DataFrame(data, columns=['ID', 'Category', 'Unit_Price', 'Qty'])
df['Unit_Price'] = df['Unit_Price'].map(int)
df['Qty'] = df['Qty'].map(int)

# ---------

df['price'] = df['Unit_Price']*df['Qty']
print( df[ df['price'].max() == df['price'] ][['ID', 'Category']] )

结果：

  ID Category
1  2    Grape
5  6    Grape

Answer 2

尝试一下：

NA

输出：

data = []
with open('myfile.txt') as f:
    for line in f:
        data.append(line.strip().split())

result = sorted(data[1:], key=lambda x: float(x[2])*int(x[3]))[-1]
_id, category, price, qty = result

print(f'id: {_id}, category: {category}, unit price: {price}, qty: {qty}, total price: {float(price)*int(qty)}')

最高价格的退货ID或类别

2 个答案: