我有一个DataFrame df_sentences
和一个列表question_words
,如下所示:
df_sentences:
sentence label
you will not forget this movie 0
will the novel ever die 1
why we drink alcohol 1
did trump win the election 1
ambiance is perfect 0
question_words = ['what', 'why', 'when', 'where', 'whose', 'which', 'whom', 'who', 'how',
'do', 'are', 'will', 'did', 'will', 'am', 'are', 'was', 'were', 'can', 'has', 'have']
我想检查sentence
列中的第一个单词是否存在于列表question_words
中,并将结果返回到新列ques_word
中。
预期输出:
sentence label ques_word
you will not forget this movie 0 0
will the novel ever die 1 1
why we drink alcohol 1 1
did trump win the election 1 1
the ambiance is perfect 0 0
到目前为止,我尝试使用的是.str.contains('|'.join(question_words)).astype(int)
,但是正如预期的那样,它返回与question_words
列表匹配的所有子字符串的所有数目。
答案 0 :(得分:2)
.str.split(" ")[0].contains('|'.join(question_words)).astype(int)
应该做的事
答案 1 :(得分:2)
如果需要快速解决方案,请使用列表理解。
q_set = set(question_words)
df['ques_word'] = [
1 if w.split(None, 1)[0] in q_set else 0 for w in df.sentence
]
df
sentence label ques_word
0 you will not forget this movie 0 0
1 will the novel ever die 1 1
2 why we drink alcohol 1 1
3 did trump win the election 1 1
4 ambiance is perfect 0 0