枢轴将行复制到新列中

时间:2018-06-25 14:22:57

标签: python pandas merge duplicates

我有一个像这样的数据框,我正在尝试使用Pandas的Pivot重塑我的数据框,使我可以保留原始行中的一些值,同时将重复行变成列并重命名。有时候我有5行重复的行

我一直在尝试,但是我不明白。

import pandas as pd
df = pd.read_csv("C:dummy")

df = df.pivot(index=["ID"], columns=["Zone","PTC"], values=["Zone","PTC"])

# Rename columns and reset the index.
df.columns = [["PTC{}","Zone{}"],.format(c) for c in df.columns]
df.reset_index(inplace=True)
# Drop duplicates
df.drop(["PTC","Zone"], axis=1, inplace=True)

输入

ID  Agent   OV  Zone Value  PTC
1   10      26   M1   10    100
2   26.5    8    M2   50    95
2   26.5    8    M1   6     5
3   4.5     6    M3   4     40
3   4.5     6    M4   6     60
4   1.2    0.8   M1   8     100
5   2      0.4   M1   6     10
5   2      0.4   M2   41    86
5   2      0.4   M4   2     4

输出

ID  Agent   OV  Zone1   Value1  PTC1    Zone2   Value2  PTC2    Zone3   Value3  PTC3
1   10      26  M_1     10       100    0          0      0      0         0      0
2   26.5    8   M_2     50        95    M_1        6      5      0         0      0
3   4.5     6   M_3     4         40    M_4        6     60      0         0      0
4   1.2    0.8  M_1     8        100    0          0      0      0         0      0
5   2      0.4  M_1     6         10    M_2        41    86     M_4        2      4

2 个答案:

答案 0 :(得分:2)

cumcount用于计数组,使用set_indexunstack创建MultiIndex,并最后平整列的值:

g = df.groupby(["ID","Agent", "OV"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack(fill_value=0).sort_index(axis=1, level=1)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100     0       0     0     0       0     0
1   2   26.5   8.0    M2      50    95    M1       6     5     0       0     0
2   3    4.5   6.0    M3       4    40    M4       6    60     0       0     0
3   4    1.2   0.8    M1       8   100     0       0     0     0       0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

如果只想替换为0个数字列:

g = df.groupby(["ID","Agent"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack().sort_index(axis=1, level=1)

idx = pd.IndexSlice
df.loc[:, idx[['Value','PTC']]] = df.loc[:, idx[['Value','PTC']]].fillna(0).astype(int)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.fillna('').reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100             0     0             0     0
1   2   26.5   8.0    M2      50    95    M1       6     5             0     0
2   3    4.5   6.0    M3       4    40    M4       6    60             0     0
3   4    1.2   0.8    M1       8   100             0     0             0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

答案 1 :(得分:1)

您可以使用cumcount创建帮助键,然后对多个索引进行平坦化({:您可以在末尾添加fillna(0),我没有添加它,原因是不认为Zone值0是正确的)

unstack