Question

在R中，Setkey可用于处理密钥，即使用聚合函数时我的数据表会自动排序。我使用的R-Command是： setkey（myData，“客户”）

Python / Pandas是否也可以使用密钥并且是否有R-Command的等价物？非常感谢。

Answer 1

R的data.table setkey（）函数，据我所知，在Python中没有直接的等价物。但是，有一些功能可以取代此功能。请注意这些函数的inplace参数。如果未指定inplace=True，则除非您明确指定（例如，`df = df.sort_values（'a'）

，否则不会更改基础数据

您可以使用sort_values()功能对一列或多列上的数据进行排序。

import pandas as pd

df = pd.DataFrame({'a': [1,1,2,1,2,2,2],
                   'b': [1,1,0,2,4,1,5],
                   'c': [3,4,5,2,6,1,7]})

>>> df
   a  b  c
0  1  1  3
1  1  1  4
2  2  0  5
3  1  2  2
4  2  4  6
5  2  1  1
6  2  5  7

>>> df.sort_values(['a', 'b'])
   a  b  c
0  1  1  3
1  1  1  4
3  1  2  2
2  2  0  5
5  2  1  1
4  2  4  6
6  2  5  7

如果要对列或列系列执行聚合，则可以使用groupby()函数。这类似于R的data.table中的by运算符。

>>> df.groupby(['a', 'b'])['c'].max()
a  b
1  1    4
   2    2
2  0    5
   1    1
   4    6
   5    7

您还可以使用set_index()函数将索引设置为一列或多列。

>>> df.set_index('a')
   b  c
a      
1  1  3
1  1  4
2  0  5
1  2  2
2  4  6
2  1  1
2  5  7

# once the index is set, you reference rows on the new index.

df.set_index('a', inplace=True)
df.ix[1]
>>> df.ix[1]
   b  c
a      
1  1  3
1  1  4
1  2  2

使用Pandas数据框中的密钥

1 个答案: