我正在使用Pandas和Python导入CSV,并且操纵导入的数据框中的数据以便创建新列。
新列中的每一行都是根据A列和B列的每个对应行中的值进行的。 数据框中有更多列包含数据,但这些列与下面的代码无关。
导入的数据框有几千行。
A列和B列都包含介于0和99之间的数值。
将pandas导入为pd
import csv
df = pd.read_csv("import.csv", names=["Id", "Month", "Name", "ColA", "ColB" ])
def f(row):
if row['colA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and row['colB'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
val = row['ColA']
elif row['ColB'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and row['ColA'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
val = row['ColB']
elif row['ColA'] > row['ColB']:
val = row['ColA']
elif row['ColA'] < row['ColB']:
val = row['ColB']
else:
val = row['ColA']
return val
df['NewColumnName'] = df.apply(f, axis=1)
df.to_csv("export.csv", encoding='utf-8')
运行上面的代码会返回错误:
AttributeError: ("'float' object has no attribute 'isin'", 'occurred at index 0')
很明显.isin()不能以这种方式使用。有关如何解决这个问题的任何建议吗?
修改 使用Jezrael的方法添加符合相同条件的列,代码看起来如下所示:
m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])
m3 = (df['colC'].isin(L1) & df['colB'].isin(L2)) | (df['ColC'] > df['ColB'])
m4 = (df['colB'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColB'])
m5 = (df['colC'].isin(L1) & df['colA'].isin(L2)) | (df['ColC'] > df['ColA'])
m6 = (df['colA'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColA'])
df['NewColumnName'] = np.select([m1, m2, m3, m4, m5, m6], [df['ColA'], df['ColB'], df['ColC'], df['ColA'], df['ColB'], df['ColC'],], default=df['ColA'])
答案 0 :(得分:3)
在pandas中,最好的是避免循环,因此long long unsigned int *arr = malloc(n * sizeof *arr);
使用numpy.select
和&
的{{1}}以及AND
|
的链条条件更好:
OR
答案 1 :(得分:2)
您需要像这样使用它:
df[df['ColA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48])]
这将为您提供上面指出的列表中ColA
值的行。您尝试按值执行此操作,但此方法适用于整个列。如果你想查看这个列表中是否有一个值,那么你可以使用numpy在函数中编写类似的东西:
if np.any(row['colA'] == [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]):
val = row['ColA']