有什么方法可以在python中复制Excel的数据透视表的紧凑形式?

时间:2019-07-29 10:50:29

标签: python pandas pandas-groupby

我正在使用Python进行Excel报表自动化。在excel报表中,我使用的数据透视表的报表布局为“紧凑表单”->其中,一个或多个列显示为行标题。例如-我下面有输入数据

Country         City         Employee    Salary $ 
 Mexico         Chiapas      A        100,000 
 Mexico         Chihuahua    B        245,132 
 Mexico         Chihuahua    C        200,000 
 Mexico         Chihuahua    D        175,000 
 United States  Alabama      E        106,088 
 United States  Alaska       F        56,121 
 United States  Arizona      G        9,737 
 United States  Arizona      H        250,000  

在excel报告中,我将其显示为-

Row Labels  Sum of Salary $
Mexico      720,132 
 Chiapas    100,000 
   A        100,000 
 Chihuahua  620,132 
   B        245,132 
   C        200,000 
   D        175,000 
United States   421,946 
 Alabama    106,088 
   E        106,088 
 Alaska     56,121 
   F        56,121 
 Arizona    259,737 
   G        9,737 
   H        250,000 

在紧凑视图中,而不是在单独的列中显示国家和城市,我将它们显示为行标题,这是excel中的功能。 我正在尝试在Python中复制相同的视图。我已使用pandas数据框输入原始文件。我使用了df.pivot和df.pivot_table,但无法获得上述视图。

我正在尝试df.pivot和df.pivot_table函数,但仅获得如下所示的常用视图-

Country          City      Employee Sum of Salary $
Mexico           Chiapas        A    100,000 
                 Chihuahua      B    245,132 
                                C    200,000 
                                D    175,000 
United States    Alabama        E    106,088 
                 Alaska         F    56,121 
                 Arizona        G    9,737 
                                H    250,000

1 个答案:

答案 0 :(得分:0)

应用多个groupbyconcat会更容易,但是您需要一个排序的框架,因此我的回答是可以专门解决您的问题:

df

    Country         City       Employee  Salary
0   Mexico          Chiapas    A         100000
1   Mexico          Chihuahua  B         245132
2   Mexico          Chihuahua  C         200000
3   Mexico          Chihuahua  D         175000
4   United States   Alabama    E         106088
5   United States   Alaska     F         56121
6   United States   Arizona    G         9737
7   United States   Arizona    H         250000

代码:

res = pd.DataFrame()
country = df.groupby("Country").sum()
for i in range(len(country)):
    c = pd.DataFrame(country.iloc[i])
    c = c.reset_index(drop = True)
    c.index = c.columns
    c = c.reset_index()
    c.columns = ["Row Labels", "Salary"]

    city = df[df["Country"] == country.iloc[i].name].groupby("City").sum()


    for j in range(len(city)):
        c2 = pd.DataFrame(city.iloc[j])
        c2 = c2.reset_index(drop = True)
        c2.index = c2.columns
        c2 = c2.reset_index()
        c2.columns = ["Row Labels", "Salary"]
        employee = df[df["City"] == city.iloc[j].name].groupby("Employee").sum()
        c3 = employee.reset_index()
        c3.columns = ["Row Labels", "Salary"]

        res = pd.concat([res,c,c2,c3])

res = res.reset_index(drop = True)
res = res.drop_duplicates().reset_index(drop = True) 

结果:

res


    Row Labels       Salary
0   Mexico           720132
1   Chiapas          100000
2   A                100000
3   Chihuahua        620132
4   B                245132
5   C                200000
6   D                175000
7   United States    421946
8   Alabama          106088
9   E                106088
10  Alaska           56121
11  F                56121
12  Arizona          259737
13  G                9737
14  H                250000

如果您不介意标签的种类,则以下解决方案会更快(如果您的数据集很大):

c1 = df.groupby(["Country"])["Salary"].sum().reset_index()
c1.columns = ["Row Labels", "Salary"]

c2 = df.groupby(["Country","City"])["Salary"].sum().reset_index()[["City","Salary"]]
c2.columns = ["Row Labels", "Salary"]

c3 = df.groupby(["Country","City","Employee"])["Salary"].sum().reset_index()[["Employee","Salary"]]
c3.columns = ["Row Labels", "Salary"]

res = pd.concat([c1,c2,c3])

res


    Row Labels          Salary
0   Mexico              720132
1   United States       421946
0   Chiapas             100000
1   Chihuahua           620132
2   Alabama             106088
3   Alaska              56121
4   Arizona             259737
0   A                   100000
1   B                   245132
2   C                   200000
3   D                   175000
4   E                   106088
5   F                   56121
6   G                   9737
7   H                   250000

希望它能起作用!