我有一个包含一些unicode的pandas数据框,我想用dog
,cat
或None
创建新列。
这是我的数据框:
df = pd.DataFrame({'comment': ['Alice likes ?', 'Bob likes ?', 'Harry likes dog', 'Don likes cat!', 'this is a tree']})
如何创建这样的新列?
comment label
0 Alice likes ? dog
1 Bob likes ? dog
2 Harry likes dog dog
3 Don likes cat! cat
4 this is a tree None
注意:我的猫和狗表情符号很少,可以手动构建字典。
dict_dog = {'dog': ['dog', "?", "?"]}
dict_cat = {'cat': ['cat']
然后我为如何进行而苦恼。
答案 0 :(得分:1)
您可以像
一样创建dict
dog = dict.fromkeys(['dog', "?", "?"],'dog')
cat= dict.fromkeys(['cat'],'cat')
然后我们使用与str.findall
之前相同的逻辑
d = {**dog ,**cat}
df.comment.str.findall('|'.join(d.keys())).str[0].map(d)
答案 1 :(得分:1)
尝试
df['label']= np.where( df['comment'].str.contains('(dog| ?|?)'), 'dog','cat')
如果动物数量多于2种动物,则可以嵌套np.where
。
df['label']= (np.where( df['comment'].str.contains('(dog| ?|?)'),'dog',
(np.where(df['comment'].str.contains('cat'), 'cat','None'))))
答案 2 :(得分:1)
这是使用 Regex 和 Apply()
的另一种方法import re
decoder = {'dog': ['dog', "?", "?"], 'cat': ['cat']}
def check(c):
c = list(map(lambda l: re.sub('[!@#$]', '', l), c.split(' ')))
res_dog = [i for i in c if i in decoder['dog']]
res_cat = [i for i in c if i in decoder['cat']]
return 'dog' if res_dog else 'cat' if res_cat else None
# Apply function
df['label'] = df['comment'].apply(check)
结果:
comment label
0 Alice likes ? dog
1 Bob likes ? dog
2 Harry likes dog dog
3 Don likes cat! cat
4 this is a tree None
答案 3 :(得分:1)
这适用于大写和小写字母。 这是在熊猫中创建列的推荐方法。如果比较简单的方法行得通,请尝试一下,然后再尝试复杂的方法。
import numpy as np
import pandas as pd
df = pd.DataFrame({'comment': ['Alice likes ?', 'Bob likes ?', 'Harry likes dog', 'Don likes cat!', 'this is a tree']})
df['comment'] = df['comment'].astype(str)
df['label'] = 'None'
df.loc[df.comment.str.lower().str.contains("dog"),'label'] = 'dog'
df.loc[df.comment.str.lower().str.contains("cat"),'label'] = 'dog'
df.loc[df.comment.str.contains("?"),'label'] = 'dog'
df.loc[df.comment.str.contains("?"),'label'] = 'dog'
print(df)
comment label
0 Alice likes ? dog
1 Bob likes ? dog
2 Harry likes dog dog
3 Don likes cat! dog
4 this is a tree None