pandas:在Dataframe的每一行上应用函数

时间:2017-12-19 09:13:33

标签: python python-2.7 pandas dataframe apply

我有一个包含3列的pandas DataFrame:

|  val1  |  val2  |  val3  | 
|--------------------------|
|  Nike  |  NaN   |  NaN   |  
|  Men   | Adidas |  NaN   |  
| Puma   |  Red   |  Women | 

和3列表:

Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']

我尝试将函数应用于每一行以检查并将每个值放在一个新列中,具体取决于函数返回的布尔值。

|  val1  |  val2  |  val3  | brand | gender | color
|----------------------------------------------------
|  Nike  |  NaN   |  NaN   |  Nike  |  NaN   | NaN
|  Men   | Adidas |  NaN   | Adidas |  Men   | NaN
|  Puma  |  Red   |  Women | Puma   |  Women | Red   

我使用列表来说明我的问题,但在我的脚本中,我使用附魔库检查字典中是否存在值。

这是我已经尝试的内容:

ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)

print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False

[pyenchant tutorial][1]

def my_cust_check(x, checker):
    l = x.tolist()
    for e in iter(l):
        try:
             if checker.check(e.strip().encode('utf-8')) is True:
                return e.strip()
             else:
                return None
        except:
             return None

df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)

1 个答案:

答案 0 :(得分:0)

您可以使用:

df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
   val1    val2   val3   brand gender color
0  Nike     NaN    NaN    Nike    NaN   NaN
1   Men  Adidas    NaN  Adidas    Men   NaN
2  Puma     Red  Women    Puma  Women   Red

详情:

首先按DataFrame.isin进行比较:

print (df.isin(Brands))
    val1   val2   val3
0   True  False  False
1  False   True  False
2   True  False  False

提取True s的值:

print (df[df.isin(Brands)])
   val1    val2 val3
0  Nike     NaN  NaN
1   NaN  Adidas  NaN
2  Puma     NaN  NaN

NaN替换为fillna并使用正向填充(ffill):

print (df[df.isin(Brands)].ffill(axis=1))
   val1    val2    val3
0  Nike    Nike    Nike
1   NaN  Adidas  Adidas
2  Puma    Puma    Puma

iloc选择最后一栏:

print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0      Nike
1    Adidas
2      Puma
Name: val3, dtype: object