我有输入数据框:
df1 = pandas.DataFrame( {
"Name" : ["Alice", "Bob", "Mallory", "Mallory","Mallory", "Bob" ,"Bob", "Mallory", "Alice"] ,
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland", "Portland", "Seattle", "Seattle"] } )
我想按名称分组,但不是唯一的,因此输出应为:
["Alice","Bob","Mallory","Bob","Mallory", "Alice"]
我找不到任何有效的方法来做-有没有一种方法可以不遍历所有行?
答案 0 :(得分:1)
您可以执行以下操作:
df1.groupby((df1['Name'] != df1['Name'].shift()).cumsum()).first()
收益:
Name City
Name
1 Alice Seattle
2 Bob Seattle
3 Mallory Portland
4 Bob Portland
5 Mallory Seattle
6 Alice Seattle
如果只需要'Name'
列:
df1.groupby((df1['Name'] != df1['Name'].shift()).cumsum())['Name'].first().values
收益:
['Alice' 'Bob' 'Mallory' 'Bob' 'Mallory' 'Alice']