我必须在Tableau中处理此调查结果,然后我认为使用Python / Pandas组合最好完成以下预处理。
输入:
User Day-1 Day-2 Day-3
1 Good Good Bad
2 Good Ok Ok
3 Good Ok Ok
4 Bad Bad Good
5 Ok Bad Bad
预期产出:
User Question Answer
1 Day-1 Good
1 Day-2 Good
1 Day-3 Good
2 Day-1 Bad
2 Day-2 Ok
2 Day-3 Good
3 Day-1 Ok
3 Day-2 Ok
3 Day-3 Bad
4 Day-1 Bad
4 Day-2 Bad
4 Day-3 Ok
5 Day-1 Ok
5 Day-2 Good
5 Day-3 Bad
(这是虚拟样本。实际调查有数百天,以及许多不同类型的答案。)
这有什么直接的解决方案吗?
答案 0 :(得分:3)
您可以使用pandas.melt
将数据从宽格式转换为长格式:
import pandas as pd
pd.melt(df, id_vars="User", var_name = "Question", value_name="Answer")
Out[246]:
User Question Answer
0 1 Day-1 Good
1 2 Day-1 Good
2 3 Day-1 Good
3 4 Day-1 Bad
4 5 Day-1 Ok
5 1 Day-2 Good
6 2 Day-2 Ok
7 3 Day-2 Ok
8 4 Day-2 Bad
9 5 Day-2 Bad
10 1 Day-3 Bad
11 2 Day-3 Ok
12 3 Day-3 Ok
13 4 Day-3 Good
14 5 Day-3 Bad
另一种选择是使用stack()
:
(df.set_index("User").stack()
.rename_axis(("User", "Question"))
.rename("Answer").reset_index())
Out[248]:
User Question Answer
0 1 Day-1 Good
1 1 Day-2 Good
2 1 Day-3 Bad
3 2 Day-1 Good
4 2 Day-2 Ok
5 2 Day-3 Ok
6 3 Day-1 Good
7 3 Day-2 Ok
8 3 Day-3 Ok
9 4 Day-1 Bad
10 4 Day-2 Bad
11 4 Day-3 Good
12 5 Day-1 Ok
13 5 Day-2 Bad
14 5 Day-3 Bad
答案 1 :(得分:1)
使用numpy
pd.DataFrame(dict(
User=df.User.values.repeat(len(df.columns) - 1),
Question=np.tile(df.columns[1:], len(df.index)),
Answer=df.values[:, 1:].ravel()
))[['User', 'Question', 'Answer']]
User Question Answer
0 1 Day-1 Good
1 1 Day-2 Good
2 1 Day-3 Bad
3 2 Day-1 Good
4 2 Day-2 Ok
5 2 Day-3 Ok
6 3 Day-1 Good
7 3 Day-2 Ok
8 3 Day-3 Ok
9 4 Day-1 Bad
10 4 Day-2 Bad
11 4 Day-3 Good
12 5 Day-1 Ok
13 5 Day-2 Bad
14 5 Day-3 Bad