Pandas的新手,有一个我自己无法回答的问题。对于上下文,这是从防火墙输出的。它会生成数百万个数据包,而我正在尝试将该数据聚合到防火墙规则集中。我想出的最好方法是根据目标IP识别流量。
如果源端口/目标端口是临时端口,则它们将更改,因此将它们聚合到同一行很重要。这样,我可以确定规则集的端口范围。
原始CSV:
dvc,“ src_interface”,传输,“ src_ip”,“ src_port”,“ dest_ip”,“ dest_port”,方向,操作,原因,计数 “防火墙-1”,外部,tcp,“ 4.4.4.4”,53,“ 1.1.1.1”,1025,出站,允许,“”,2 “防火墙-1”,外部,tcp,“ 4.4.4.4”,53,“ 1.1.1.1”,1026,出站,允许,“”,2 “防火墙-1”,外部,tcp,“ 4.4.4.4”,22,“ 1.1.1.1”,1028,出站,允许,“”,2 “防火墙-1”,外部,tcp,“ 3.3.3.3”,22,“ 2.2.2.2”,2200,出站,允许,“”,2
数据框:
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed NaN 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed NaN 2
2 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1028 outbound allowed NaN 2
3 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed NaN 2
如何合并具有相同dest_ip的行?
代码:
df = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
index_cols = df.columns.tolist()
index_cols.remove('dest_ip')
df = df.groupby(index_cols, as_index=False)['dest_ip'].apply(list)
print(df)
预期输出:
Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025-1026,1028 outbound allowed nan 2
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed nan 2
我在网上找到的大多数示例都涉及到连接两个数据框,而我只有一个。任何帮助,将不胜感激。预先感谢!
答案 0 :(得分:0)
我认为以下可能会满足您的需求:
remove_action( 'woocommerce_single_product_summary', 'woocommerce_template_single_excerpt', 20 );
原始数据框:
import pandas as pd
#create practice dataframe. will remove rows if values in 'key' are duplicate
df = pd.DataFrame({'key':[1,1,3,4],'color':[1,2,3,2],'house':[1,2,3,7]})
print(df.drop_duplicates(['key']))
输出数据框:
key color house
1 1 1
1 2 2
3 3 3
4 2 7
答案 1 :(得分:0)
尝试一下。将希望复制信息的所有列分组,然后将不同的“ dest_port”值聚合到一个列表中:
df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
这将导致剩余3行,而不是所需输出的2行:
dvc src_interface transport src_ip src_port dest_ip direction action cause count dest_port
0 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
1 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028]
2 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 outbound allowed 2 [1025, 1026]