Question

我正在从这样的CSV中读取数据：

     for chunk in pd.read_csv(file, chunksize=50000, names = col_names, header = 0, dtype = dtype):
           chunk['derived_field_1'] = [1 if x == 'High' else -1 for x in chunk['indicator']]

以上是有效的，它基于一个条件。我想根据两个字段的条件来做这个。使值的组合总数为8.作为示例

    chunk['derived_field_2'] = [chunk['column_1'] if ((x == 'Red' for x in chunk['Color']) and (y == 'High' for y in chunk['Indicator'])) else
                          chunk['column_2'] if ((x == 'Green' for x in chunk['Color']) and (y == 'Low' for y in chunk['Indicator'])) else 0]

我想做上述事情并继续使用其他条件，如上所述的其他6个条件。这是失败的，两个for循环不起作用。我收到此错误 -

raise ValueError('Length of values does not match length of ' 'index')
ValueError: Length of values does not match length of index

有人知道这个错误的原因吗？

Answer 1

您可以将numpy.where用于矢量化解决方案：

import numpy as np
chunk['derived_field_2'] = (np.where((chunk['Color'] == "Red") & (chunk["Indicator"] == "High"), chunk["column_1"], 
   np.where((chunk['Color'] == "Green") & (chunk["Indicator"] == "Low"), chunk["column_2"], 0))

Pandas根据许多条件添加了一些额外的列

1 个答案: