我有一个这样的数据框:
我想对句子包含cat
还是dog
或None
进行分类。
df = pd.DataFrame({'comment': ['this is a dog', 'beautiful dog', 'nice cat!', 'this is a tree']})
如何创建一个具有以下值的名为“标签”的新列?
df['label'] = ['dog','dog','cat', None]
必填输出:
comment label
0 this is a dog dog
1 beautiful dog dog
2 nice cat! cat
3 this is a tree None
答案 0 :(得分:2)
这是characters = ["a", "b", "a", "a", "b", "b"]
for i in range(len(characters)):
if characters[i] == "a":
print(i)
findall
或
df['label'] = df.comment.str.findall('|'.join(['cat','dog'])).str[0]
Out[10]:
0 dog
1 dog
2 cat
3 NaN
Name: comment, dtype: object
答案 1 :(得分:1)
def animal(comment):
x = re.findall('cat|dog',comment)
if x:
return x
else:
return None
df['label'] = df['comment'].apply(animal)
即使这两种情况都可能发生,