我知道这个问题可能看似微不足道,但我无法在任何地方找到解决方案。我有一个非常大的pandas数据帧df
,看起来像这样:
conference IF2013 AR2013
0 HOTMOBILE 16.333333 31.50
1 FOGA 13.772727 60.00
2 IEA/AIE 10.433735 28.20
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.00
4 Symposium on Computational Geometry 9.880342 35.00
5 WISA 9.693878 43.60
6 ICMT 8.750000 22.00
7 Haskell 8.703704 39.00
我想在最后添加一个额外的列,命令1,2,3,4等。所以它看起来像这样:
conference IF2013 AR2013 Ranking
0 HOTMOBILE 16.333333 31.50 1
1 FOGA 13.772727 60.00 2
2 IEA/AIE 10.433735 28.20 3
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.00 4
我似乎无法弄清楚如何添加一个只填充一系列连续数字的填充额外列。
答案 0 :(得分:2)
我猜您正在寻找rank
函数:
df['rank'] = df['IF2013'].rank()
这样您的结果将独立于索引。
答案 1 :(得分:1)
您可以使用range
添加列:
df['Ranking'] = range(1, len(df) + 1)
示例:
import pandas as pd
from io import StringIO
data = """
conference IF2013 AR2013
HOTMOBILE 16.333333 31.50
FOGA 13.772727 60.00
IEA/AIE 10.433735 28.20
IEEE Real-Time and Embedded Technology and App... 10.250000 29.00
Symposium on Computational Geometry 9.880342 35.00
WISA 9.693878 43.60
ICMT 8.750000 22.00
Haskell 8.703704 39.00
"""
df = pd.read_csv(StringIO(data), sep=' \s+')
df['Ranking'] = range(1, len(df) + 1)
In [183]: df
Out[183]:
conference IF2013 AR2013 Ranking
0 HOTMOBILE 16.333333 31.5 1
1 FOGA 13.772727 60.0 2
2 IEA/AIE 10.433735 28.2 3
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
4 Symposium on Computational Geometry 9.880342 35.0 5
5 WISA 9.693878 43.6 6
6 ICMT 8.750000 22.0 7
7 Haskell 8.703704 39.0 8
修改强>
基准:
In [202]: %timeit df['rank'] = range(1, len(df) + 1)
10000 loops, best of 3: 127 us per loop
In [203]: %timeit df['rank'] = df.AR2013.rank(ascending=False)
1000 loops, best of 3: 248 us per loop
答案 2 :(得分:1)
您可以尝试:
df['rank'] = df.index + 1
print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8
或使用rank
参数ascending=False
:
df['rank'] = df['conference'].rank(ascending=False)
print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8