根据另一列的值创建新列

时间:2020-02-04 10:01:43

标签: python pandas

我在数据框中有一列使用不同的字符串。

Additional Information  |  
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn  
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray  
Undefined System Error  
Specific groupname=CUSTGR1
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck  

我想要做的是使用上面列中的相应值创建新列,即IP地址和MAC地址。

这样期望的输出看起来像这样:

Additional Information                                  |IP Address  | MAC Address     |    
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn |192.168.1.1 |00:0a:95:9d:68:16|  
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray|192.168.0.1 |00:0a:95:9d:68:17|   
Undefined System Error                                  |            |                 |
Specific groupname=CUSTGR1                              |            |                 |  
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck |192.168.1.2 |00:1B:44:11:3A:B7|  

问题是,我无法处理不包含IP和MAC的行。我尝试使用np.where进行拆分以及找到部分匹配项,但没有成功。

1 个答案:

答案 0 :(得分:3)

想法是使用列表理解,如果没有缺失值或无,并且存在,=,并进行过滤,则传递给DataFrame构造函数,最后使用DataFrame.join来原始: / p>

L = [dict(y.split("=") for y in v.split(", "))  
         if pd.notna(v) and ('=' in v) and (', ' in v)
         else {}
         for v in df['Additional Information']]

df1 = pd.DataFrame(L, index=df.index)
print (df1)
            IP           MAC ADDR     USER
0  192.168.1.1  00:0a:95:9d:68:16   kwfinn
1  192.168.0.1  00:0a:95:9d:68:17  wattray
2          NaN                NaN      NaN
3          NaN                NaN      NaN
4  192.168.1.2  00:1B:44:11:3A:B7   stwnck

df = df.join(df1[['IP','MAC ADDR']])
print (df)
                              Additional Information           IP  \
0  IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, US...  192.168.1.1   
1  IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, US...  192.168.0.1   
2                           Undefined System Error            NaN   
3                         Specific groupname=CUSTGR1          NaN   
4  IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, US...  192.168.1.2   

            MAC ADDR  
0  00:0a:95:9d:68:16  
1  00:0a:95:9d:68:17  
2                NaN  
3                NaN  
4  00:1B:44:11:3A:B7