Question

我有以下Pandas DataFrame：

Year        Bananas     Apples

2015 - 1    151235.0    NaN
2015 - 10   517326.0    NaN
2015 - 11   497511.0    NaN
2015 - 12   503372.0    NaN
2015 - 13   524244.0    NaN
2015 - 14   505785.0    11588.0
2015 - 15   493530.0    19170.0
2015 - 16   511167.0    18304.0
2015 - 17   605087.0    19030.0
2015 - 18   523477.0    20732.0
2015 - 19   410203.0    22032.0
2015 - 2    410268.0    NaN
2015 - 20   436890.0    21447.0
2015 - 21   412306.0    21957.0
2015 - 22   390683.0    23072.0

我希望使用＆＃34;年＆＃34; column作为我的DataFrame的索引，但排序不正常。可以看出，＆＃34; 2015 - 2＆＃34;应该在2015年之前 - 10＆＃34;。

列中的所有值＆＃34;年＆＃34;是字符串。格式为[年，周编号]。我想保留这种格式，因为除了年份和周数之外我没有任何其他信息。

我已尝试使用pd.sort_values命令按升序对值进行排序，但这并未解决问题。我也试过设置＆＃34;年＆＃34;列作为我的索引并使用pd.sort_index命令，但这也不起作用。

我是Python和Pandas的新手，所以非常感谢任何帮助。谢谢。

Answer 1

不幸的是，pandas sort函数没有key参数来提供自定义比较功能。但您可以根据“年份”添加新列，并使用它对数据进行排序。

df = pd.DataFrame({
    'Year': ['2015 - 10', '2015 - 1', '2015 - 2'],
    'bla': [3, 1, 2]
})

df['index'] = df['Year'].apply(lambda x: list(map(int, x.split(' - '))))
print(df)
df = df.sort_values('index')
print(df)
df = df.drop('index', axis=1)  # drop index if you don't need it
print(df)

输出：

        Year  bla       index
0  2015 - 10    3  [2015, 10]
1   2015 - 1    1   [2015, 1]
2   2015 - 2    2   [2015, 2]
        Year  bla       index
1   2015 - 1    1   [2015, 1]
2   2015 - 2    2   [2015, 2]
0  2015 - 10    3  [2015, 10]
        Year  bla
1   2015 - 1    1
2   2015 - 2    2
0  2015 - 10    3

Pandas DataFrame无序索引

1 个答案: