我得到一些防火墙流量日志并进行分析
我要混合两个groupby.sum()结果
这是我的代码
def analysis(data_location, col_name):
DATA_OPEN = open(data_location, "r")
DATA = DATA_OPEN.readlines()
DATA_OPEN.close()
df = []
for data in DATA:
data = data.rstrip("\n")
data = data.split()
df.append({"Firewall":data[0], "Gatway":data[1], "DATE":data[2],
"Rule_name":data[3], col_name:data[4], "Count":int(data[5])})
df = pd.DataFrame(df)
df = df[["Firewall", "Gatway", "DATE", "Rule_name", col_name, "Count"]]
df = df.groupby(["Firewall", "Gatway", "DATE", "Rule_name", col_name])
print(df.sum().reset_index())
和这个结果
DST = analysis("united_temp_fw_dst_log.txt", "dst")
"""the result
Count
Firewall Gatway DATE Rule_name dst
10_1_81_34 vsys1 2019104 allow_Drop 10.1.81.255 34
10.255.63.18 16
103.226.213.30 4
129.146.178.96 282
183.177.72.201 4
183.177.72.202 4
220.133.209.243 4
8.8.8.8 597"""
SRC = analysis("united_temp_fw_src_log.txt", "src")
"""the result
Count
Firewall Gatway DATE Rule_name src
10_1_81_34 vsys1 2019104 allow_Drop 10.1.81.10 8
10.1.81.11 12
10.1.81.115 11
10.1.81.118 3
10.1.81.245 911"""
我想使用[“ Firewall”,“ Gatway”,“ DATE”,“ Rule_name”]作为这样的索引和列
Firewall Gatway DATE Rule_name src count dst count
10_1_81_34 vsys1 2019104 allow_Drop 10.1.81.10 8 10.1.81.255 34
10.1.81.11 12 10.255.63.18 16
10.1.81.115 11 103.226.213.30 4
10.1.81.118 3 129.146.178.96 282
10.1.81.245 911 183.177.72.201 4
183.177.72.202 4
220.133.209.243 4
8.8.8.8 597
我该怎么办?我尝试过reset_index()和groupby(),但这不是我想要的答案。
答案 0 :(得分:0)
一个简单的连接就可以解决问题:
DST.join(SRC)
答案 1 :(得分:0)
是否可以更改列名,以免重复列名(以您的情况为准)?如果是,我将使用pandas concat函数:
#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1','vsys1'],
'dst':['10.1.81.255','10.255.63.18','103.226.213.30'],
'count_dst':[34,16,4]})
df.set_index(['Firewall','Gatway'],inplace=True)
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1','vsys1'],
'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
'count_src':[8,12,11]})
df2.set_index(['Firewall','Gatway'],inplace=True)
#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)
使用pd.concat我得到以下输出:
dst count_dst src count_src
Firewall Gatway
10_1_81_34 vsys1 10.1.81.255 34 10.1.81.10 8
vsys1 10.255.63.18 16 10.1.81.11 12
vsys1 103.226.213.30 4 10.1.81.115 11
编辑以使用不同长度的数据框:
#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1'],
'dst':['10.1.81.255','10.255.63.18'],
'count_dst':[34,16]})
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1','vsys1'],
'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
'count_src':[8,12,11]})
#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)
#Remove duplicated columns
df3.Firewall=df3.Firewall.dropna(axis=1)
df3.Gatway=df3.Gatway.dropna(axis=1)
df3=df3.loc[:,~df3.columns.duplicated()]
#set index
df3.set_index(['Firewall','Gatway'],inplace=True)
这是输出:
dst count_dst src count_src
Firewall Gatway
10_1_81_34 vsys1 10.1.81.255 34.0 10.1.81.10 8
vsys1 10.255.63.18 16.0 10.1.81.11 12
vsys1 NaN NaN 10.1.81.115 11