对数据框(Pandas)中的列进行排序/字母排序

时间:2016-09-10 19:06:52

标签: pandas dataframe alphabetical-sort

我无法按字母顺序排列数据框第一列和第二列中的名称。

数据框看起来像这样:

          Boys       Females
Rank                        
1         Michael   Jennifer
2     Christopher    Jessica
3         Matthew     Amanda
4           Jason      Sarah
5           David    Melissa
6          Joshua        Amy
7           James     Nicole
8            John  Stephanie
9          Robert  Elizabeth
10         Daniel    Heather
11         Joseph   Michelle
12         Justin    Rebecca
13           Ryan   Kimberly
14          Brian    Tiffany

我希望它看起来像这样:(男孩和女性的名字按字母顺序排列)

 Rank     Boys                 Rank    Females
 14       Brian                  3        Amanda
  2       Christopher            6        Amy
 10       Daniel                 9        Elizabeth
  5       David                 10        Heather
  7       James                  1        Jennifer

我玩过sort和sort_value但是列没有改变。这是我的原始代码

import pandas as pd



df = pd.read_html("file:///C:/Python27/babyname999.html")

df2 =df[0]  # creating a data frame from the above list of dateframes

df2.rename(columns = {'0': 'Rank', '1': 'Boys', '2': 'Females'}, inplace = True)
del df2['Unnamed: 0']

#renaming columns of dataframe


df2.set_index('Rank', inplace = True)  #setting index of dataframe to 'Rank'

我玩过sort和sort_value但是列没有改变。我没有在哪里。有什么建议吗?

谢谢!

3 个答案:

答案 0 :(得分:3)

这是排序的工作示例。

import pandas as pd
from io import StringIO

data_file = StringIO(u"""Rank       Boys       Females
1         Michael   Jennifer
2     Christopher    Jessica
3         Matthew     Amanda
4           Jason      Sarah
5           David    Melissa
6          Joshua        Amy
7           James     Nicole
8            John  Stephanie
9          Robert  Elizabeth
10         Daniel    Heather
11         Joseph   Michelle
12         Justin    Rebecca
13           Ryan   Kimberly
14          Brian    Tiffany""")

df = pd.read_table(data_file, delim_whitespace=True)

boys = df[['Rank','Boys']].sort_values(['Boys']).rename(columns={'Rank': 'Rank_boys'})
females = df[['Rank','Females']].sort_values(['Females']).rename(columns={'Rank': 'Rank_females'})
result = pd.concat([boys.reset_index(drop=True), females.reset_index(drop=True)], axis=1)

结果将是:

Rank_boys Boys Rank_females Females
0   14  Brian   3   Amanda
1   2   Christopher 6   Amy
2   10  Daniel  9   Elizabeth
3   5   David   10  Heather
4   7   James   1   Jennifer
5   4   Jason   2   Jessica
6   8   John    13  Kimberly
7   11  Joseph  5   Melissa
8   6   Joshua  11  Michelle
9   12  Justin  7   Nicole
10  3   Matthew 12  Rebecca
11  1   Michael 4   Sarah
12  9   Robert  8   Stephanie
13  13  Ryan    14  Tiffany

答案 1 :(得分:2)

IIUC(你很难发布预期/期望的DF)你可以这样做:

df = (pd.read_html("file:///C:/Python27/babyname999.html")[0]
        .rename(columns = {'0': 'Rank', '1': 'Boys', '2': 'Females'})
        .drop('Unnamed: 0', 1)
        .set_index('Rank')
)

然后:

In [86]: df['Rank_Boys'], df['Rank_Females'] = df.sort_values('Boys').index, df.sort_values('Females').index

In [87]: df
Out[87]:
           Boys    Females  Rank_Boys  Rank_Females
1       Michael   Jennifer         14             3
2   Christopher    Jessica          2             6
3       Matthew     Amanda         10             9
4         Jason      Sarah          5            10
5         David    Melissa          7             1
6        Joshua        Amy          4             2
7         James     Nicole          8            13
8          John  Stephanie         11             5
9        Robert  Elizabeth          6            11
10       Daniel    Heather         12             7
11       Joseph   Michelle          3            12
12       Justin    Rebecca          1             4
13         Ryan   Kimberly          9             8
14        Brian    Tiffany         13            14

答案 2 :(得分:2)

独立排序数据框的不同列的问题是,pandas将仅使用每个独立排序列的索引并重新对齐它们,从而破坏您的排序工作。你必须改为排序并返回结果排序系列的值...足够的谈话,一个例子将解释更多。

假设df是您的示例数据框。然后

df.apply(lambda x: x.sort_values().values)

enter image description here

接近你要求的东西

lst = [df[c].sort_values().reset_index(name='Name') for c in df]
keys = df.columns
pd.concat(lst, axis=1, keys=keys)

enter image description here