Question

这可能会被标记为重复项，但是尽管尝试了this答案，但我似乎无法解决问题。问题是我的数据格式很长，每个subject的值都需要一个新行，并且beta应该扩展为A_i_beta，A_j_beta ...等在我的实际数据集中还有更多的值。

我做了一个小的数据示例，运行了下面的代码，并且成功了。但是，当我将其扩展到实际数据集时，我得到了一个错误。这是我的示例数据和代码：

subject name    region  xyz beta
1   A   i   54 -52 8    0.149812742
1   B   i   54 -52 8    1.23882482
2   A   i   54 -52 8    -1.150757713
2   B   i   54 -52 8    -0.635049755
1   A   j   16 -66 58   2.452675111
1   B   j   16 -66 58   1.193138828
2   A   j   16 -66 58   -1.063844842
2   B   j   16 -66 58   -0.69318946

df = df.drop('xyz', axis=1)
df = df.set_index(['subject', 'name', 'region'])
df = df.unstack()
df.columns = [f'{i[0]}_{i[1]}' for i in df.columns]
df = df.reset_index()
print(df)

   subject name    beta_i    beta_j
0        1    A  0.149813  2.452675
1        1    B  1.238825  1.193139
2        2    A -1.150758 -1.063845
3        2    B -0.635050 -0.693189

所以可行。但是，当我将其应用于实际数据集时，出现以下错误：

df = df.set_index(['subject', 'name', 'region'])
df = df.unstack()

ValueError: Index contains duplicate entries, cannot reshape

我不太确定从这里到哪里去。

大熊猫从长格式到宽格式堆叠

0 个答案: