我正在使用您找到的norway_new_car_sales_by_model.csv Dataset here数据集。我想找到多年来销量波动最大的模型。我使用的是每种型号的年度总销售额的标准差。 预期产出是:
import pandas as pd
import numpy as np
data=pd.read_csv("norway_new_car_sales_by_model.csv",header=None,encoding="latin-1")
data.columns = ['Year','Month','Make','Model','Quantity','Pct']#give column name
data.drop(data.head(1).index, inplace=True) #drop first row
data[['Quantity']]=data[['Quantity']].astype(np.int64)
data.dropna(subset=['Quantity'], how='all', inplace = True)
maketotal_1 = data.pivot_table(values='Quantity',index=['Month','Model','Make'],aggfunc=np.std)
我的问题是
1)我没有处理nan值......即使我尝试了很多代码......
2)如何从索引栏中获取奥迪A4奥迪
答案 0 :(得分:2)
我认为需要:
首先从header=None
删除参数read_csv
,因为csv中的第一个是列名:
data=pd.read_csv("norway_new_car_sales_by_model.csv",encoding="latin-1")
print (data.head())
Year Month Make Model Quantity Pct
0 2007 1 Volkswagen Volkswagen Passat 1267 10.0
1 2007 1 Toyota Toyota Rav4 819 6.5
2 2007 1 Toyota Toyota Avensis 787 6.2
3 2007 1 Volkswagen Volkswagen Golf 720 5.7
4 2007 1 Toyota Toyota Corolla 691 5.4
将pivot_table
功能应用于np.std
:
maketotal_1=data.pivot_table(values='Quantity',index=['Month','Model','Make'],aggfunc=np.std)
print (maketotal_1.head())
Quantity
Month Model Make
1 Audi A3 Audi 50.986109
Audi A4 Audi 60.549704
Audi A6 Audi NaN
Audi Q3 Audi NaN
BMW 2-serie BMW NaN
上一次首先按dropna
删除NaN
并使用reset_index
将MultiIndex
转换为列并创建唯一的默认索引:
df1 = maketotal_1.dropna().reset_index()
Make
每个群组的最后一个群组按idxmax
获取最大值的索引,然后按loc
选择行:
df3 = df1.loc[df1.groupby('Make')['Quantity'].idxmax()]
print (df3)
Month Model Make Quantity
447 12 Audi A3 Audi 119.867427
415 11 BMW i3 BMW 460.936366
56 2 Ford Mondeo Ford 169.889880
235 6 Honda CR-V Honda 171.579671
457 12 Hyundai ix35 Hyundai 32.526912
348 9 Kia Sportage Kia 55.154329
60 2 Mazda CX-5 Mazda 144.030957
14 1 Mercedes-Benz GLC Mercedes-Benz 119.501046
160 4 Mitsubishi ASX Mitsubishi 312.541197
391 10 Nissan Leaf Nissan 225.322584
114 3 Opel Astra Opel 85.182158
22 1 Peugeot 207 Peugeot 97.962578
168 4 Renault Zoe Renault 53.740115
395 10 Skoda Octavia Skoda 121.668767
122 3 Suzuki Vitara Suzuki 85.559921
123 3 Tesla Model S Tesla 510.400823
33 1 Toyota Corolla Toyota 326.683333
179 4 Volkswagen Golf Volkswagen 454.872681
485 12 Volvo V40 Volvo 183.919366
编辑:
没有Citroen
因为np.std
返回NaN
:
print (maketotal_1[maketotal_1.index.get_level_values('Make') == 'Citroen '])
Quantity
Month Model Make
11 Citroen C4 Aircross Citroen NaN