Question

我的数据框的列是调查问题，行是响应。唯一的响应选择是（1-强烈不同意，2-不同意，3-中立，4-同意，5-强烈同意）。这些行具有所有受访者的选择，我理想情况下希望这些列是唯一的回答选择，其中包含每个问题有多少人选择该回答以及每个问题的行数之和。

不确定如何获取-有任何建议吗？

我的原始数据 My original data

试图进行转置，成功地使问题排成一行，但现在每个响应我都有100唯一的“行” Tried to transpose which successfully made the questions rows but now I have 100s of unique "rows" for each response

最终目标是对问题进行分组，并在每个问题下选择答案，并以列作为每个答案的总和 Ultimate goal would be to group-by the questions and have the response choices under each question with the columns as a sum for each response

Answer 1

这有点快速又肮脏，但可能有帮助

编辑已更新，可将提示符转换为熊猫数据框

设置示例数据框

df = pd.DataFrame ({ 'question_1' : ['1 - strongly agree','1 - strongly agree','2 - agree'], 
                     'question_2' : ['3 - neutral','2 - agree','2 - agree'],
                     'question_3' : ['1 - strongly agree','2 - agree','3 - neutral'],
                     'question_4' : ['4 - disagree','5 - strongly disgree','5 - strongly disgree'],
                     'question_5' : ['3 - neutral','2 - agree','2 - agree']} )

获取每一列的value_counts（）

ls_flat = []
for col in  df.columns:
    for index in df[col].value_counts().index:
        print(col,index, df[col].value_counts()[index])
        ls_flat.append([col,index, df[col].value_counts()[index]])

将此列表放入数据框

df_flat = pd.DataFrame(ls_flat)

将列重命名为更有意义的

df_flat.columns = ['question', 'response', 'tally']

这将创建一个看起来像这样的数据框

Answer 2

这是您的代码和分组依据的最终结果！ final results!

清理调查数据并将求和结果汇总在python数据框中

2 个答案: