Pandas:使用.isin()返回错误:" AttributeError:float'对象没有属性' isin'"

时间:2018-02-18 14:39:48

标签: python pandas csv dataframe

我正在使用Pandas和Python导入CSV,并且操纵导入的数据框中的数据以便创建新列。

新列中的每一行都是根据A列和B列的每个对应行中的值进行的。 数据框中有更多列包含数据,但这些列与下面的代码无关。

导入的数据框有几千行。

A列和B列都包含介于0和99之间的数值。

将pandas导入为pd

import csv

df = pd.read_csv("import.csv", names=["Id", "Month", "Name", "ColA", "ColB" ])

def f(row):
    if row['colA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and  row['colB'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
        val = row['ColA']
    elif row['ColB'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and  row['ColA'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
        val = row['ColB']
    elif row['ColA'] > row['ColB']:
        val = row['ColA']
    elif row['ColA'] < row['ColB']:
        val = row['ColB']
    else: 
        val = row['ColA']
    return val            

df['NewColumnName'] = df.apply(f, axis=1)   

df.to_csv("export.csv", encoding='utf-8')

运行上面的代码会返回错误:

AttributeError: ("'float' object has no attribute 'isin'", 'occurred at index 0')

很明显.isin()不能以这种方式使用。有关如何解决这个问题的任何建议吗?

修改 使用Jezrael的方法添加符合相同条件的列,代码看起来如下所示:

m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])
m3 = (df['colC'].isin(L1) & df['colB'].isin(L2)) | (df['ColC'] > df['ColB'])
m4 = (df['colB'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColB'])
m5 = (df['colC'].isin(L1) & df['colA'].isin(L2)) | (df['ColC'] > df['ColA'])
m6 = (df['colA'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColA'])



df['NewColumnName'] = np.select([m1, m2, m3, m4, m5, m6], [df['ColA'], df['ColB'], df['ColC'], df['ColA'], df['ColB'], df['ColC'],], default=df['ColA'])

2 个答案:

答案 0 :(得分:3)

在pandas中,最好的是避免循环,因此long long unsigned int *arr = malloc(n * sizeof *arr); 使用numpy.select&的{​​{1}}以及AND |的链条条件更好:

OR

答案 1 :(得分:2)

您需要像这样使用它:

df[df['ColA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48])]

这将为您提供上面指出的列表中ColA值的行。您尝试按值执行此操作,但此方法适用于整个列。如果你想查看这个列表中是否有一个值,那么你可以使用numpy在函数中编写类似的东西:

if np.any(row['colA'] == [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]):
   val = row['ColA']