连接条件

时间:2018-06-01 03:49:35

标签: python pandas concatenation conditional-statements

这个问题可能有点棘手......

我有一个函数,可以根据列中的某些值标记数据框。该函数接收参数,数据帧和字典。此字典具有键值对,用于指示列(键)以及必须使用特定数字标记的值。例如:

{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1}

当数据帧的“ip_src”列具有值“192.168.84.129”而列“ip_dst”具有值“192.168.84.128”时,必须将这些行标记为白色“1”。 问题是这些条件可能会有所不同,所以我想概括代码,所以我可以通过以下几个条件:

{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}

等等。

我开始时:

def labeling(df, crit):
    for dic in crit:
        lbl = dic["label"]
        del dic["label"]
        conds = []
        pairs = len(dic)
        for key in dic:
            conds.append((df[key] == dic[key])) 

但我陷入了最后一行,因为我无法想出如何连接条件然后将它们应用为:df[conds] = lbl

谢谢!

修改

输入:

   index         ip_src         ip_dst  ip_proto  frame_time_delta  \
0      0  192.168.84.129 192.168.84.128      17.0          0.000000   
1      1    31.13.94.53  192.168.1.101      17.0          0.006656   
2      2  192.168.1.101    31.13.94.53      17.0          0.012948   

   payload_size  src_port  dst_port  flow_dir  
0         172.0   52165.0   40002.0         1  
1         176.0   40002.0   52165.0         0  
2         172.0   52165.0   19305.0         1 

输出:

       ip_src         ip_dst       ip_proto  frame_time_delta  \
0  192.168.84.129 192.168.84.128     17.0          0.000000   
1    31.13.94.53  192.168.1.101      17.0          0.006656   
2  192.168.1.101    31.13.94.53      17.0          0.012948   

   payload_size  src_port  dst_port  flow_dir   label
0         172.0   52165.0   35456.0         1    1 
1         176.0   40002.0   52165.0         0    0
2         172.0   52165.0   19305.0         1    4

可能的情况:

l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
          {"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
          {"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
          {"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
          {"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]

1 个答案:

答案 0 :(得分:1)

试试这个,

crit=[{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]

dictionary={}
for dic in crit:
    dictionary[dic['ip_src']]=dic['label']
df['label']=df['ip_src'].map(dictionary).fillna(0)

输入:

           ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  
0   52165.0   35456.0         1  
1   40002.0   52165.0         0  
2   52165.0   19305.0         1

输出:

           ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  label  
0   52165.0   35456.0         1    1.0  
1   40002.0   52165.0         0    0.0  
2   52165.0   19305.0         1    4.0 

编辑1:

l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
          {"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
          {"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
          {"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
          {"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]


temp=pd.DataFrame()

l=[]
v=[]
for dic in l_crit:
    l.append(dic['ip_src'])
    v.append(dic['label'])
temp['ip_src']=l
temp['label']=v

df=pd.merge(df,temp,how='left',on=['ip_src'])
df['label']=df['label'].fillna(0)

输入:

          ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  
0   52165.0   35456.0         1  
1   40002.0   52165.0         0  
2   52165.0   19305.0         1

输出:

           ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   
3   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   
4   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  label  
0   52165.0   35456.0         1    1.0  
1   40002.0   52165.0         0    0.0  
2   52165.0   19305.0         1    2.0  
3   52165.0   19305.0         1    3.0  
4   52165.0   19305.0         1    4.0