我正在从这样的CSV中读取数据:
for chunk in pd.read_csv(file, chunksize=50000, names = col_names, header = 0, dtype = dtype):
chunk['derived_field_1'] = [1 if x == 'High' else -1 for x in chunk['indicator']]
以上是有效的,它基于一个条件。我想根据两个字段的条件来做这个。使值的组合总数为8.作为示例
chunk['derived_field_2'] = [chunk['column_1'] if ((x == 'Red' for x in chunk['Color']) and (y == 'High' for y in chunk['Indicator'])) else
chunk['column_2'] if ((x == 'Green' for x in chunk['Color']) and (y == 'Low' for y in chunk['Indicator'])) else 0]
我想做上述事情并继续使用其他条件,如上所述的其他6个条件。这是失败的,两个for循环不起作用。我收到此错误 -
raise ValueError('Length of values does not match length of ' 'index')
ValueError: Length of values does not match length of index
有人知道这个错误的原因吗?
答案 0 :(得分:2)
您可以将numpy.where
用于矢量化解决方案:
import numpy as np
chunk['derived_field_2'] = (np.where((chunk['Color'] == "Red") & (chunk["Indicator"] == "High"), chunk["column_1"],
np.where((chunk['Color'] == "Green") & (chunk["Indicator"] == "Low"), chunk["column_2"], 0))