我有一个pandas df,我需要按照文本字符串的列变量进行排序。我试过三种方法。前两个是相似的。最后一种方法排序,但它也产生了一个神秘的专栏。
这是小测试数据集:
raw_corpus #test data
unique_ID count trigger_channel_cat
0 11530 1 Photo and Video
1 17176 1 Environment Control and Monitoring
2 6984 1 Security and Monitoring Systems
3 15696 1 Photo and Video
4 16103 3 Finance and Payments
5 18534 5 News and Information
6 11677 331 Social Networks
7 702 1 Contacts
8 7251 1 Business Tools
9 10609 1 Photo and Video
10 1703 2 Blogging
11 20567 1 Social Networks
12 8357 1 Social Networks
13 4313 1 Fitness and Wearables
14 8552 1 Contacts
15 7634 1 News and Information
16 13698 1 Social Networks
17 13940 4 Business Tools
18 19784 3 Location
19 3561 1 Task Management and To-Dos
使用value_counts不起作用:
raw_corpus_sorted=raw_corpus['trigger_channel_cat'].value_counts().index.tolist()
raw_corpus_sorted
['Social Networks',
'Photo and Video',
'Business Tools',
'Contacts',
'News and Information',
'Fitness and Wearables',
'Location',
'Security and Monitoring Systems',
'Task Management and To-Dos',
'Environment Control and Monitoring',
'Blogging',
'Finance and Payments']
再次尝试使用对value_counts的不同调用,为每个类别提供正确的实例数,但不对类别进行排序:
raw_corpus_sorted=raw_corpus['trigger_channel_cat'].value_counts(sort=True)
raw_corpus_sorted
Social Networks 4
Photo and Video 3
Business Tools 2
Contacts 2
News and Information 2
Fitness and Wearables 1
Location 1
Security and Monitoring Systems 1
Task Management and To-Dos 1
Environment Control and Monitoring 1
Blogging 1
Finance and Payments 1
Name: trigger_channel_cat, dtype: int64
使用sort_values()排序!但是第一列是什么?
#this one works - but what is that first column?
raw_corpus_sorted=raw_corpus['trigger_channel_cat'].sort_values()
raw_corpus_sorted
10 Blogging
17 Business Tools
8 Business Tools
14 Contacts
7 Contacts
1 Environment Control and Monitoring
4 Finance and Payments
13 Fitness and Wearables
18 Location
15 News and Information
5 News and Information
0 Photo and Video
9 Photo and Video
3 Photo and Video
2 Security and Monitoring Systems
11 Social Networks
6 Social Networks
16 Social Networks
12 Social Networks
19 Task Management and To-Dos
Name: trigger_channel_cat, dtype: object
答案 0 :(得分:1)
当你致电sort_values
raw_corpus_sorted=raw_corpus.sort_values('trigger_channel_clean')
自添加数据
df.sort_values(' trigger_channel_cat')
Out[1086]:
unique_ID count trigger_channel_cat
10 1703 2 Blogging
17 13940 4 Business Tools
8 7251 1 Business Tools
14 8552 1 Contacts
1 17176 1 Environment Control and
4 16103 3 Finance and Payments
13 4313 1 Fitness and Wearables
18 19784 3 Location
15 7634 1 News and Information
5 18534 5 News and Information
0 11530 1 Photo and Video
9 10609 1 Photo and Video
3 15696 1 Photo and Video
2 6984 1 Security and Monitoring
12 8357 1 Social Networks
6 11677 331 Social Networks
16 13698 1 Social Networks
11 20567 1 Social Networks
19 3561 1 Task Management and To-
7 702 1 acts
对于value_counts
,您可以sort_index
df['trigger_channel_cat'].value_counts(sort=True).sort_index()
Out[1088]:
Blogging 1
Business Tools 2
Contacts 1
Environment Control and 1
Finance and Payments 1
Fitness and Wearables 1
Location 1
News and Information 2
Photo and Video 3
Security and Monitoring 1
Social Networks 4
Task Management and To- 1
acts 1
Name: trigger_channel_cat, dtype: int64