编辑: @rong @shaik moeed 这是生成数据框架和我面临的问题的一部分的代码:
temp = [[1, 'blblblblblb. The quaity of research was good. blblblblb'],
[2, 'blblblblblb. The quaity of research was average. blblblblb'],
[3, 'blblblblblb. The quaity of research was poor. blblblblb'],
[4, 'blblblblblb. The quaity of research was good. blblblblb']
]
Data = pd.DataFrame(temp,columns=['ID','Report'])
Data['Sentence']=Data['Report'].str.extract(r"([^.]*?The quaity of research was [^.]*\.)")
Quality_dic=dict([(1, 'excellent'), (2, 'good'), (3, 'average') , (4, 'poor'), (5, 'unassessable')])
Data['Quality']=[k for k,v in Quality_dic.items() if v in Data['Sentence'].str.split()]
不幸的是,建议的解决方案仍然无法正常工作。
关于如何解决此问题的任何想法? 谢谢大家的时间和投入
答案 0 :(得分:0)
我已经创建了一个df作为您的数据,并根据需要完全实现了。
在Quality_dic
中,您为Good
和Unassessable
使用相同的密钥。因此Good
将被Unassessable
覆盖。
立即尝试
>>> temp = [[1, 'blblblblblb. The quaity of research was good. blblblblb'],
[2, 'blblblblblb. The quaity of research was average. blblblblb'],
[3, 'blblblblblb. The quaity of research was poor. blblblblb'],
[4, 'blblblblblb. The quaity of research was good. blblblblb']
]
>>> Data = pd.DataFrame(temp,columns=['ID','Report'])
>>> Data['Sentence']=Data['Report'].str.extract(r"([^.]*?The quaity of research was [^.]*\.)")
>>> index_col = []
>>> for index, row in Data.iterrows():
index_col.append([k for k,v in Quality_dic.items() if v.lower() in row['Sentence'].replace('.','').split()][0])
>>> Data["index_col"]=index_col
输出:
>>> Data
ID ... index_col
0 1 ... 2
1 2 ... 3
2 3 ... 4
3 4 ... 2
[4 rows x 4 columns]
注意:
... - means columns are hiding as there is no space to display.
答案 1 :(得分:0)
quality_dic = dict([(1, 'Excellent'), (2, 'Good'), (3, 'Average') , (4, 'Poor'), (2, 'Unassessable')])
sentence = 'The quality of the research was Poor' # note that 'Poor' here is capitalized
for rating in quality_dic:
if quality_dic[rating] in sentence:
print(quality_dic[rating]) # df['Quality'] = quality_dic[rating]
# or if you want a one-liner:
df['Quality'] = [quality_dic[rating] in sentence for rating in quality_dic]