这个问题可能有点棘手......
我有一个函数,可以根据列中的某些值标记数据框。该函数接收参数,数据帧和字典。此字典具有键值对,用于指示列(键)以及必须使用特定数字标记的值。例如:
{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1}
当数据帧的“ip_src”列具有值“192.168.84.129”而列“ip_dst”具有值“192.168.84.128”时,必须将这些行标记为白色“1”。 问题是这些条件可能会有所不同,所以我想概括代码,所以我可以通过以下几个条件:
{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}
等等。
我开始时:
def labeling(df, crit):
for dic in crit:
lbl = dic["label"]
del dic["label"]
conds = []
pairs = len(dic)
for key in dic:
conds.append((df[key] == dic[key]))
但我陷入了最后一行,因为我无法想出如何连接条件然后将它们应用为:df[conds] = lbl
谢谢!
修改
输入:
index ip_src ip_dst ip_proto frame_time_delta \
0 0 192.168.84.129 192.168.84.128 17.0 0.000000
1 1 31.13.94.53 192.168.1.101 17.0 0.006656
2 2 192.168.1.101 31.13.94.53 17.0 0.012948
payload_size src_port dst_port flow_dir
0 172.0 52165.0 40002.0 1
1 176.0 40002.0 52165.0 0
2 172.0 52165.0 19305.0 1
输出:
ip_src ip_dst ip_proto frame_time_delta \
0 192.168.84.129 192.168.84.128 17.0 0.000000
1 31.13.94.53 192.168.1.101 17.0 0.006656
2 192.168.1.101 31.13.94.53 17.0 0.012948
payload_size src_port dst_port flow_dir label
0 172.0 52165.0 35456.0 1 1
1 176.0 40002.0 52165.0 0 0
2 172.0 52165.0 19305.0 1 4
可能的情况:
l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
{"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
{"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
{"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]
答案 0 :(得分:1)
试试这个,
crit=[{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]
dictionary={}
for dic in crit:
dictionary[dic['ip_src']]=dic['label']
df['label']=df['ip_src'].map(dictionary).fillna(0)
输入:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir
0 52165.0 35456.0 1
1 40002.0 52165.0 0
2 52165.0 19305.0 1
输出:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir label
0 52165.0 35456.0 1 1.0
1 40002.0 52165.0 0 0.0
2 52165.0 19305.0 1 4.0
编辑1:
l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
{"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
{"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
{"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]
temp=pd.DataFrame()
l=[]
v=[]
for dic in l_crit:
l.append(dic['ip_src'])
v.append(dic['label'])
temp['ip_src']=l
temp['label']=v
df=pd.merge(df,temp,how='left',on=['ip_src'])
df['label']=df['label'].fillna(0)
输入:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir
0 52165.0 35456.0 1
1 40002.0 52165.0 0
2 52165.0 19305.0 1
输出:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
3 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
4 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir label
0 52165.0 35456.0 1 1.0
1 40002.0 52165.0 0 0.0
2 52165.0 19305.0 1 2.0
3 52165.0 19305.0 1 3.0
4 52165.0 19305.0 1 4.0