Question

为什么我能够用（'a'，'b'）重命名熊猫系列中的行但不能（1.0,2.0）。为什么元组中的值类型很重要？

df = pd.DataFrame({'a': [1,2,3,4,5], 'b':[1,1,1,1,1,]}).set_index('a')

df.rename(index={1:(1,2)})
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
df.rename(index={1:('1','2')})
        b
a
(1, 2)  1
2       1
3       1
4       1
5       1

我非常希望能够将其保留为整数/浮点数。

Answer 1

我不确定为什么不能使用rename完成，但您可以在列表中创建整数或浮点元组，然后将结果分配给索引。

这适用于Pandas 0.14.1：

idx = [(1, 2), 2, 3, 4, 5]
df.index = idx
>>> df
        b
(1, 2)  1
2       1
3       1
4       1
5       1

修改以下是与500k行数据帧的时序比较。

import numpy as np import pandas as pd df = pd.DataFrame({'a': [1,2,3,4,5]*100000, 'b':[1,1,1,1,1,]*100000}) # Create 100k random numbers in the range of the index. rn = np.random.random_integers(0, 499999, 100000) # Normal lookup using `loc`. >>> %%timeit -n 3 some_list = [] [some_list.append(df.loc[a]) for a in rn] 3 loops, best of 3: 6.63 s per loop # Normal lookup using 'xs' (used only for getting values, not setting them). >>> %%timeit -n 3 some_list = [] [some_list.append(df.xs(a)) for a in rn] 3 loops, best of 3: 4.46 s per loop # Set the index to tuple pairs and lookup using 'xs'. idx = [(a, a + 1) for a in np.arange(500000)] df.index = idx >>> %%timeit -n 3 some_list = [] [some_list.append(df.xs((a, a + 1))) for a in rn] 3 loops, best of 3: 4.64 s per loop

如您所见，从数据框中查找值时，性能差异可以忽略不计。

请注意，您无法使用＆＃39; loc＆＃39;使用元组索引：

>>> df.loc[(1, 2)] KeyError: 'the label [1] is not in the [index]'

重命名Panda Series / DataFrame的索引

1 个答案: