Question

有没有办法改进Python中编写的代码。我使用Pandas和Python 3.4库：

bd_data = pd.DataFrame(list(bd_data))
    column = list(bd_data[numeric])
    for i in range(0,len(column)):
        pos = bisect.bisect_left(intervalsArray,int(column[i]))
        bd_data.ix[i,'colorCluster'] = colorsPalette[pos]

我正在尝试根据间隔列表中数字的位置从colorPalette中为colorCluster指定颜色。处理16000行大约需要6秒钟，这太过分了。我想我并没有按照预期的方式使用Pandas，特别是在这里：

bd_data.ix[i,'colorCluster']

我实际上在R（使用rpy2）中使用这行代码在不到一秒的时间内执行此操作：

dataToAnalyse$colorCluster <- colorsPalette[findInterval(dataToAnalyse$numeric, intervals)+1]

我确信有一种方法可以提高Python的性能，因为很多人说使用这种语言的处理速度比R更快（并不总是）。另外，请为我提出更好的问题标题。我不能流利地使用熊猫术语。

Answer 1

您可以更改：

bd_data.ix[i,'colorCluster'] = colorsPalette[pos]

到DataFrame.set_value：

bd_data.set_value(i, 'colorCluster', colorsPalette[pos])

Answer 2

我自己对python很新..但在这种情况下，列表理解会更快吗？

bd_data['colorCluster'] = [colorsPalette[bisect.bisect_left(intervalsArray,int(column_iter))] for column_iter in column]

编辑：申请会更快吗？

bd_data['colorCluster'] = bd_data.apply(lambda x: bisect.bisect_left(intervalsArray,x))

设置Pandas列时提高性能

2 个答案: