I have a Pandas Dataframe containing cars for sale and I'd like to get the most popular for each brand, however I seem unable to do this.
I have a pandas dataframe with some columns (e.g: vehicle type, price, mileage, year, brand, model, etc) and for each car brand, I'd like to check which model occurs the most.
I've tried to use a groupby, like this:
popular_models = dataset.groupby('brand').model.value_counts().groupby(level=0).nlargest(1)
But it returns a Pandas Series in which some of the data I want is stored in the indices and it also adds one repeated column that is not making any sense to me.
I'd like to get a a DataFrame containing 3 columns, like this:
(https://imgur.com/a/BkKBrv9)
However, I'm getting a pandas series like this:
(https://imgur.com/a/u8CSXY4)
Can someone please help me figure this out?
答案 0 :(得分:1)
您必须对要保留的两个对象进行分组,然后计算要查找其出现的对象。这是示例输入文件:
Brand Model
Acura RDX
Acura RDX
Acura RDX
Acura RDX
Acura RDX
Acura RDX
Acura RDX
Acura RDX
Acura RDX
Acura RDX
Beach Baby
Beach Baby
Beach Baby
Beach Baby
Beach Baby
Beach Baby
Beach Baby
Beach Baby
Beach Baby
Beach Baby
BMW 320i
BMW 320i
BMW 320i
BMW 320i
BMW 320i
BMW 320i
BMW 320i
BMW 550i
BMW 550i
BMW 550i
BMW 550i
BMW 550i
BMW 550i
BMW 550i
Cadillac Escalade
Cadillac Escalade
Cadillac Escalade
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
Chana Cargo
简单的大熊猫一只班轮:
df = pd.read_table('fun.txt', header=0)
print(df.groupby(['Brand','Model'])['Model'].agg(['count']))
输出:
count
Brand Model
Acura RDX 10
BMW 320i 7
550i 7
Beach Baby 10
Cadillac Escalade 3
Chana Cargo 12
如果要按频率对值进行排序(从最大到最小),并且只保留最大的值,将单线更改为:
groupby_df = (df.groupby(['Brand','Model'])['Model'].agg(['count']).sort_values(by='count', ascending=False).reset_index().drop_duplicates('Brand', keep='first'))
获得:
Brand Model count
0 Chana Cargo 12
1 Acura RDX 10
2 Beach Baby 10
3 BMW 320i 7
5 Cadillac Escalade 3
答案 1 :(得分:1)
一种解决方案是对groupby
操作进行排序,然后删除重复项:
df = pd.DataFrame({'Brand': ['B1'] * 5 + ['B2'] * 5,
'Model': ['M1', 'M2', 'M1', 'M2', 'M3',
'N1', 'N1', 'N2', 'N3', 'N1']})
df['Count'] = df.groupby(['Brand', 'Model'])['Model'].transform('count')
res = df.sort_values('Count', ascending=False)\
.drop_duplicates('Brand')
print(res)
# Brand Model Count
# 5 B2 N1 3
# 0 B1 M1 2
请注意,这会删除重复的分组最高计数。
答案 2 :(得分:0)
这是一种方法。
设置DataFrameGroupBy对象:
import React from "react";
import ReactDOM from "react-dom";
import App from "./components/app";
import './index.css';
import 'bootstrap/dist/css/bootstrap.css';
ReactDOM.render(<App />, document.getElementById("root"));
使用GroupBy df.groupby(["Brand", "Model"])
函数计算每个子组的大小(以系列形式返回):
size
在命名包含由df.groupby(["Brand", "Model"]).size()
计算的值的列的同时转换回DataFrame:
size
按照df.groupby(["Brand", "Model"]).size().reset_index(name="Count")
子组项目的降序对DataFrame进行排序:
Count
拖放重复的df.groupby(["Brand", "Model"]).size().reset_index(name="Count").sort_values(by="Count", ascending=False)
值,将第一个条目保留在DataFrame中:
Brand