如何混合groupby.sum()

时间:2019-10-04 08:56:21

标签: python pandas

我得到一些防火墙流量日志并进行分析

我要混合两个groupby.sum()结果

这是我的代码

    def analysis(data_location, col_name):


    DATA_OPEN = open(data_location, "r")
    DATA = DATA_OPEN.readlines()
    DATA_OPEN.close()
    df = []

    for data in DATA:

        data = data.rstrip("\n")
        data = data.split()
        df.append({"Firewall":data[0], "Gatway":data[1], "DATE":data[2],
                   "Rule_name":data[3], col_name:data[4], "Count":int(data[5])})




    df = pd.DataFrame(df)

    df = df[["Firewall", "Gatway", "DATE", "Rule_name", col_name, "Count"]]
    df = df.groupby(["Firewall", "Gatway", "DATE", "Rule_name", col_name])
    print(df.sum().reset_index())

和这个结果

    DST = analysis("united_temp_fw_dst_log.txt", "dst")

    """the result
                                                      Count
    Firewall   Gatway DATE    Rule_name  dst                   
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.255         34
                                         10.255.63.18        16
                                         103.226.213.30       4
                                         129.146.178.96     282
                                         183.177.72.201       4
                                         183.177.72.202       4
                                         220.133.209.243      4
                                         8.8.8.8            597"""


    SRC = analysis("united_temp_fw_src_log.txt", "src")
    """the result
                                                          Count
    Firewall   Gatway DATE    Rule_name  src               
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.10       8
                                         10.1.81.11      12
                                         10.1.81.115     11
                                         10.1.81.118      3
                                         10.1.81.245    911"""

我想使用[“ Firewall”,“ Gatway”,“ DATE”,“ Rule_name”]作为这样的索引和列

    Firewall   Gatway DATE    Rule_name  src          count     dst             count
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.10       8    10.1.81.255         34
                                         10.1.81.11      12    10.255.63.18        16
                                         10.1.81.115     11    103.226.213.30       4
                                         10.1.81.118      3    129.146.178.96     282
                                         10.1.81.245    911    183.177.72.201       4
                                                               183.177.72.202       4
                                                               220.133.209.243      4 
                                                               8.8.8.8            597

我该怎么办?我尝试过reset_index()和groupby(),但这不是我想要的答案。

2 个答案:

答案 0 :(得分:0)

一个简单的连接就可以解决问题:

DST.join(SRC)

答案 1 :(得分:0)

是否可以更改列名,以免重复列名(以您的情况为准)?如果是,我将使用pandas concat函数:

#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'dst':['10.1.81.255','10.255.63.18','103.226.213.30'],
         'count_dst':[34,16,4]})
df.set_index(['Firewall','Gatway'],inplace=True)
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
         'count_src':[8,12,11]})
df2.set_index(['Firewall','Gatway'],inplace=True)

#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)

使用pd.concat我得到以下输出:

                              dst  count_dst          src  count_src
Firewall   Gatway                                                   
10_1_81_34 vsys1      10.1.81.255         34   10.1.81.10          8
           vsys1     10.255.63.18         16   10.1.81.11         12
           vsys1   103.226.213.30          4  10.1.81.115         11

编辑以使用不同长度的数据框:

#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1'],
         'dst':['10.1.81.255','10.255.63.18'],
         'count_dst':[34,16]})
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
         'count_src':[8,12,11]})

#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)
#Remove duplicated columns
df3.Firewall=df3.Firewall.dropna(axis=1)
df3.Gatway=df3.Gatway.dropna(axis=1)
df3=df3.loc[:,~df3.columns.duplicated()]

#set index
df3.set_index(['Firewall','Gatway'],inplace=True)

这是输出:

                            dst  count_dst          src  count_src
Firewall   Gatway                                                 
10_1_81_34 vsys1    10.1.81.255       34.0   10.1.81.10          8
           vsys1   10.255.63.18       16.0   10.1.81.11         12
           vsys1            NaN        NaN  10.1.81.115         11