Question

我正在使用的数据框具有三个列，分别基于“最佳居住地”的三个列表，分别名为list1，list2和list3。

所需的输出：

我想返回另一列，系列或分组依据，显示考虑到所有列表的位置后每个城市的总体排名，因此Edingburgh将排在列表的首位，而其他排名将根据它们的接近程度而排名排名到每列的顶部。需要说明的是，爱丁堡在list2和list3中排名第一。

它看起来像这样：

1 Edingburgh 
2 Hart 
3 Orkney, London, Solihull 
4 Rutland, Bristol Hertfordshire 
5 Wychavon, Newcastle, Northumberland

基本上，我希望在考虑所有列表后查看每个城市的总体排名，并了解如何使用熊猫实现这一目标。

我尝试了什么？

我希望使用places2live.rank()之类的东西进行排名的简单方法，但看不到如何将其用于字符串值。

数据

    list1      list2        list3
0   Hart       Edinburgh    Edinburgh
1   Orkney     London       Solihull
2   Rutland    Bristol      Hertfordshire
3   Wychavon   Newcastle    Northumberland
4   Winchester Manchester   South Lanarkshire
5   Wokingham  Glasgow      Berkshire
6   Waverley   Leeds        Darlington
7   Craven     Cardiff      North Lanarkshire

Answer 1

这是一种方法：

cities = pd.Index(np.unique(df.values))
ranks = pd.Series([1] * len(cities), index=cities)

for column in df:
    ranks = ((ranks + df.reset_index().set_index(column)['index'])/2).fillna(ranks)

city_ranks = ranks.reset_index().groupby(0)['index'].apply(list).reset_index(drop=True)
city_ranks.index += 1
print(city_ranks)

[输出]

1                                    [Edinburgh]
2                                         [Hart]
3                     [London, Orkney, Solihull]
4              [Bristol, Hertfordshire, Rutland]
5          [Newcastle, Northumberland, Wychavon]
6    [Manchester, South Lanarkshire, Winchester]
7                [Berkshire, Glasgow, Wokingham]
8                  [Darlington, Leeds, Waverley]
9           [Cardiff, Craven, North Lanarkshire]

使用Pandas对多个字符串列进行排名

1 个答案: