我正在努力解决两个问题:
我有一个带汽车的示例数据集。 20名受访者被问及他们最喜欢的汽车。他们最多可以列举五辆车(列#34;答案1" - "答案5"。我如何获得每辆车的提及次数?
对于列Answer 1
中提到的汽车 - Answer 2
,每位受访者都注意到每辆车的三大优势(例如列Adv car 1_1
,Adv car 1_2
,{{1} }与首次提到的汽车有关,Adv car 1_3
,Adv car 2_1
,Adv car 2_2
与第二辆汽车有关。)。
我怎样才能说明每辆车有多少特别优势?
我需要这样的信息:
Adv car 2_3
- Ferrari
3次,Engine
- 5次,Color
- 3次等,Price
- Audi
- 4次,Engine
- 3次,Color
- 2次等。第二个问题与spss和R变量有关,但我无法将其转换为Python。在这种情况下,它可能与融化数据帧有关,但我的尝试并不是非常有效。
我非常感谢你的帮助。 CSV file is available to download或以下:
Price
答案 0 :(得分:0)
df = pd.read_csv('cars.csv',sep=';')
df_car = df.melt(id_vars='No',value_vars=['Answer 1','Answer 2','Answer 3','Answer 4','Answer 5'], value_name='Cars').drop('variable',axis=1)
df_advantages = df.melt(id_vars='No',value_vars=['Adv car 1_1', 'Adv car 1_2', 'Adv car 1_3', 'Adv car 2_1',
'Adv car 2_2', 'Adv car 2_3', 'Adv car 3_1', 'Adv car 3_2',
'Adv car 3_3', 'Adv car 4_1', 'Adv car 4_2', 'Adv car 4_3',
'Adv Car 5_1', 'Adv car 5_2', 'Adv car 5_3'], value_name='Advantages').drop('variable',axis=1)
df_car.merge(df_advantages, on='No').groupby(['Cars','Advantages']).count()
输出:
No
Cars Advantages
Audi Brand 20
Color 29
Engine 6
Longevity 22
Manufacturer 18
Price 10
Ferrari Brand 14
Color 19
Engine 4
Longevity 17
Manufacturer 10
Price 9
Renault Brand 17
Color 21
Engine 4
Longevity 15
Manufacturer 13
Price 10
Toyota Brand 18
Color 27
Engine 8
Longevity 17
Manufacturer 12
Price 13
Volkswagen Brand 17
Color 26
Engine 6
Longevity 20
Manufacturer 13
Price 12