从评分中提取评论

时间:2019-08-15 19:44:57

标签: python pandas

给出:

import pandas as pd

survey = [('How much do you like apples?', 4),
         ('How much do you like oranges?', 5),
         ('How much do like bananas?', 5),
         ('Why do you like fruits?', "They are the best")]
labels = ['Question', 'Answer']

before= pd.DataFrame.from_records(survey, columns=labels)

应如下所示:

survey = [('How much do you like apples?', 4, "NaN"),
         ('How much do you like oranges?', 5, "NaN"),
         ('How much do like bananas?', 5, "NaN"),
         ('Why do you like fruits?',"NaN", "They are the best")]
labels = ['Question', 'Answer', 'Comments']

after= pd.DataFrame.from_records(survey, columns=labels)

我正在处理大量的调查答复。我遇到的问题是,在“答案”列下,响应是1-5,或者是注释(字符串)。我试图将此列分解为仅包含连续数据(1-5)和仅包含注释(字符串)的Answer列。这些新列需要在当前df中形成。谁会知道可以帮助我入门的功能吗?

谢谢。

1 个答案:

答案 0 :(得分:0)

我们可以使用to_numeric

s=pd.to_numeric(before.Answer,errors='coerce')
before['Comments']=before.Answer.where(s.isnull())
before['Answer']=s

输出

before
Out[199]: 
                        Question  Answer           Comments
0   How much do you like apples?     4.0                NaN
1  How much do you like oranges?     5.0                NaN
2      How much do like bananas?     5.0                NaN
3        Why do you like fruits?     NaN  They are the best