我正在尝试通过数组中的单词列表对数据框中的文本进行分类。如果找到了该词,则下一列将填充该词,否则不给出任何内容
到目前为止的代码:
Product=['Fish','food','Product','Expensive','cheap','expensive','seafood','ice cream','delicious','taste','smell','selection','price','grilled']
df=pd_read_csv("text.csv")
df['classify']=""
for i in range(len(df)):
paragraph=df[i]
count = Counter(paragraph.split())
pos = 0
for key, val in count.items():
key = key.rstrip('.,?!\n') # removing possible punctuation signs
if key in positive:
df['classify'][i]=key
所需结果:
Text Classify
"The food is bad" food
"He parked the car" none
任何帮助将不胜感激!
答案 0 :(得分:0)
这应该有效:
import pandas as pd
Product=['Fish','food','Product','Expensive','cheap','expensive','seafood','ice cream','delicious','taste','smell','selection','price','grilled']
df=pd.DataFrame({'Text':["The food is bad", "He parked the car"]})
def classify(text):
for i in Product:
if i in ''.join(text.values).split():
return i
return None
df['classify']=df.apply(classify, axis=1)
输出:
Text classify
0 The food is bad food
1 He parked the car None
答案 1 :(得分:0)
您应该创建如下函数:
def classify(classification_list, text, data_id):
for check_word in classification_list:
if check_word.lower() in text.lower():
df['classify'][data_id] = check_word
break
else:
df['classify'][data_id] = None
和用法:
products=['Fish','food','Product','Expensive','cheap','expensive','seafood','ice cream','delicious','taste','smell','selection','price','grilled']
for data_id in range(0, len(df)):
classify(products, df['text'][data_id], data_id)
最后,您将获得如下所示的DataFrame:
>>> df
text classify
0 The food is bad food
1 He parked the car None