Question

所以，我刚刚开始使用python，我需要显示最高价格和拥有它的公司。我从CSV文件获得了数据，该文件具有描述某些汽车的多列。我只对其中两个感兴趣：价格和公司。

我需要显示最高价格和拥有最高价格的公司。一些忠告？这是我尝试过的方法，我也不知道如何获得公司，不仅仅是最高价格。

import pandas as pd
df = pd.read_csv("Automobile_data.csv")
for x in df['price']:
    if x == df['price'].max():
       print(x)

Answer 1

使用Series.max，通过DataFrame.set_index创建索引，并通过Series.idxmax获得company的名称：

df = pd.DataFrame({
        'company':list('abcdef'),
         'price':[7,8,9,4,2,3],

})

print (df)
  company  price
0       a      7
1       b      8
2       c      9
3       d      4
4       e      2
5       f      3

print(df['price'].max())
9
print(df.set_index('company')['price'].idxmax())
c

另一个想法是使用DataFrame.agg：

s = df.set_index('company')['price'].agg(['max','idxmax'])
print (s['max'])
9
print (s['idxmax'])
c

如果可能的话，可能会有重复的最大值，并且需要所有最高价格的公司都将boolean indexing与DataFrame.loc一起使用-得到Series：

df = pd.DataFrame({
        'company':list('abcdef'),
         'price':[7,8,9,4,2,9],

})

print (df)
  company  price
0       a      7
1       b      8
2       c      9
3       d      4
4       e      2
5       f      9

print(df['price'].max())
9

#only first value
print(df.set_index('company')['price'].idxmax())
c

#all maximum values
s = df.loc[df['price'] == df['price'].max(), 'company']
print (s)
2    c
5    f
Name: company, dtype: object

如果需要一行DataFrame：

out = df.loc[df['price'] == df['price'].max(), ['company','price']]
print (out)
  company  price
2       c      9


out = df.loc[df['price'] == df['price'].max(), ['company','price']]
print (out)
  company  price
2       c      9
5       f      9

Answer 2

这就是不使用熊猫的方法。制作熊猫是为了避免循环

import pandas as pd
df = pd.read_csv("Automobile_data.csv")

max_price = df[df['price'] == df['price'].max()]
print(max_price)

这就是您要做的。如果您只想要价格和公司

print(max_price[['company','price']])

说明：我们创建一个布尔过滤器，如果价格等于最高价格，则为true。我们以此为掩盖来捕捉我们需要的东西。

Answer 3

除了Jezrael的完整答案外，我建议如下使用groupby：

df = pd.DataFrame({
        'company':list('abcdef'),
         'price':[7,8,9,4,2,3],

})

sorted_df = df.groupby(['price']).max().reset_index()

desired_row = sorted_df.loc[sorted_df.index[-1]]

price = desired_row[0]
company = desired_row[1]

print('Maximum price is: ', price)
print('The company is: ', company)

上面的代码打印：

Maximum price is:  9
The company is:  c

需要显示最高价格和拥有它的公司

3 个答案: