Question

切换到以下数据框后，我的代码出现了速度问题：

df = pd.DataFrame(data=products, columns=['pk', 'product_name', 'category_name', 'brand_name'])
df.set_index(['pk'], inplace=True)

这是我使用数据框的唯一地方。＆＃39; PK＆＃39;是整数。

            category = self.product_list.iloc[int(prod)-1]['category_name']
            brand = self.product_list.iloc[int(prod)-1]['brand_name']

我在这做错了什么？

Answer 1

您可以使用iat：

print product_list.category_name.iat[int(prod)-1]
print product_list.brand_name.iat[int(prod)-1]

时间安排（index - string）：

样品：

product_list = pd.DataFrame({'brand_name': {'r': 'r', 'g': 't', 'w': 'i'}, 
                             'category_name': {'r': 's', 'g': 'f', 'w': 'a'}})
print product_list
  brand_name category_name
g          t             f
r          r             s
w          i             a

In [242]: %timeit product_list.iloc[int(prod)-1]['category_name']
The slowest run took 8.27 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 82.7 µs per loop

In [243]: %timeit product_list.brand_name.iat[int(prod)-1]
The slowest run took 16.01 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 9.96 µs per loop

index：int：

product_list = pd.DataFrame({'brand_name': {0: 't', 1: 'r', 2: 'i'}, 
                             'category_name': {0: 'f', 1: 's', 2: 'a'}})
print product_list
  brand_name category_name
0          t             f
1          r             s
2          i             a

In [250]: %timeit product_list.iloc[int(prod)-1]['category_name']
The slowest run took 8.24 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 84.7 µs per loop

In [251]: %timeit product_list.brand_name.iat[int(prod)-1]
The slowest run took 24.17 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 9.86 µs per loop

如何加快对pandas数据帧的行列访问？

1 个答案: