使用Pandas展平调查数据

时间:2017-01-21 02:09:27

标签: python pandas

我必须在Tableau中处理此调查结果,然后我认为使用Python / Pandas组合最好完成以下预处理。

输入:

  User      Day-1       Day-2   Day-3
    1       Good        Good    Bad
    2       Good        Ok      Ok
    3       Good        Ok      Ok
    4       Bad         Bad     Good
    5       Ok          Bad     Bad

预期产出:

User Question       Answer
1   Day-1           Good
1   Day-2           Good
1   Day-3           Good
2   Day-1           Bad
2   Day-2           Ok
2   Day-3           Good
3   Day-1           Ok
3   Day-2           Ok
3   Day-3           Bad
4   Day-1           Bad
4   Day-2           Bad
4   Day-3           Ok
5   Day-1           Ok
5   Day-2           Good
5   Day-3           Bad

(这是虚拟样本。实际调查有数百天,以及许多不同类型的答案。)

这有什么直接的解决方案吗?

2 个答案:

答案 0 :(得分:3)

您可以使用pandas.melt将数据从宽格式转换为长格式:

import pandas as pd
pd.melt(df, id_vars="User", var_name = "Question", value_name="Answer")

Out[246]:
  User  Question  Answer
0   1   Day-1   Good
1   2   Day-1   Good
2   3   Day-1   Good
3   4   Day-1   Bad
4   5   Day-1   Ok
5   1   Day-2   Good
6   2   Day-2   Ok
7   3   Day-2   Ok
8   4   Day-2   Bad
9   5   Day-2   Bad
10  1   Day-3   Bad
11  2   Day-3   Ok
12  3   Day-3   Ok
13  4   Day-3   Good
14  5   Day-3   Bad

另一种选择是使用stack()

(df.set_index("User").stack()
   .rename_axis(("User", "Question"))
   .rename("Answer").reset_index())

Out[248]:
  User  Question Answer
0   1   Day-1   Good
1   1   Day-2   Good
2   1   Day-3   Bad
3   2   Day-1   Good
4   2   Day-2   Ok
5   2   Day-3   Ok
6   3   Day-1   Good
7   3   Day-2   Ok
8   3   Day-3   Ok
9   4   Day-1   Bad
10  4   Day-2   Bad
11  4   Day-3   Good
12  5   Day-1   Ok
13  5   Day-2   Bad
14  5   Day-3   Bad

答案 1 :(得分:1)

使用numpy

pd.DataFrame(dict(
        User=df.User.values.repeat(len(df.columns) - 1),
        Question=np.tile(df.columns[1:], len(df.index)),
        Answer=df.values[:, 1:].ravel()
        ))[['User', 'Question', 'Answer']]

    User Question Answer
0      1    Day-1   Good
1      1    Day-2   Good
2      1    Day-3    Bad
3      2    Day-1   Good
4      2    Day-2     Ok
5      2    Day-3     Ok
6      3    Day-1   Good
7      3    Day-2     Ok
8      3    Day-3     Ok
9      4    Day-1    Bad
10     4    Day-2    Bad
11     4    Day-3   Good
12     5    Day-1     Ok
13     5    Day-2    Bad
14     5    Day-3    Bad