我正在尝试按特定列汇总数据框中的数据。当我使用数据框构造函数时,它会起作用:
df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
数据帧
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2
输出
dvc src_interface transport src_ip src_port dest_ip direction action cause count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]
问题是当我尝试从CSV导入数据时
fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))
数据帧 和上面一样
输出
Series([], Name: dest_port, dtype: float64)
CSV文件与上面的构造函数具有完全相同的数据,但是似乎被不同地对待。任何帮助,将不胜感激。预先感谢!
CSV
dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2
答案 0 :(得分:0)
问题是“原因”列中的数据为空。熊猫讨厌这个。您可以使用以下任一解决方案来解决此问题。
删除列:
df.drop(columns=['column_name'], inplace=True)
用数据填充列
df.column_name.fillna('', inplace=True)
(对于这些示例,column_name ='cause')