我有一个包含3列的pandas DataFrame:
| val1 | val2 | val3 |
|--------------------------|
| Nike | NaN | NaN |
| Men | Adidas | NaN |
| Puma | Red | Women |
和3列表:
Brands = ['Adidas', 'Nike', 'Puma']
Gender = ['Men', 'Women']
Color=['Red', 'Blue', 'Green']
我尝试将函数应用于每一行以检查并将每个值放在一个新列中,具体取决于函数返回的布尔值。
| val1 | val2 | val3 | brand | gender | color
|----------------------------------------------------
| Nike | NaN | NaN | Nike | NaN | NaN
| Men | Adidas | NaN | Adidas | Men | NaN
| Puma | Red | Women | Puma | Women | Red
我使用列表来说明我的问题,但在我的脚本中,我使用附魔库检查字典中是否存在值。
这是我已经尝试的内容:
ref_brands = enchant.request_pwl_dict("ref_brands.txt")
brands_checker = SpellChecker(ref_brands)
print brands_checker.check('Puma')
> True
print brands_checker.check('Men')
> False
[pyenchant tutorial][1]
def my_cust_check(x, checker):
l = x.tolist()
for e in iter(l):
try:
if checker.check(e.strip().encode('utf-8')) is True:
return e.strip()
else:
return None
except:
return None
df_query_split['brand'] = df_query_split.apply(my_cust_check,checker=brand_checker, axis=1)
df_query_split['gender'] = df_query_split.apply(my_cust_check,checker=gender_checker, axis=1)
df_query_split['color'] = df_query_split.apply(my_cust_check,checker=color_checker, axis=1)
答案 0 :(得分:0)
您可以使用:
df['brand'] = df[df.isin(Brands)].ffill(axis=1).iloc[:, -1]
df['gender'] = df[df.isin(Gender)].ffill(axis=1).iloc[:, -1]
df['color'] = df[df.isin(Color)].ffill(axis=1).iloc[:, -1]
print (df)
val1 val2 val3 brand gender color
0 Nike NaN NaN Nike NaN NaN
1 Men Adidas NaN Adidas Men NaN
2 Puma Red Women Puma Women Red
详情:
首先按DataFrame.isin
进行比较:
print (df.isin(Brands))
val1 val2 val3
0 True False False
1 False True False
2 True False False
提取True
s的值:
print (df[df.isin(Brands)])
val1 val2 val3
0 Nike NaN NaN
1 NaN Adidas NaN
2 Puma NaN NaN
将NaN
替换为fillna
并使用正向填充(ffill
):
print (df[df.isin(Brands)].ffill(axis=1))
val1 val2 val3
0 Nike Nike Nike
1 NaN Adidas Adidas
2 Puma Puma Puma
按iloc
选择最后一栏:
print (df[df.isin(Brands)].ffill(1).iloc[:, -1])
0 Nike
1 Adidas
2 Puma
Name: val3, dtype: object