我在数据框中有一列使用不同的字符串。
Additional Information |
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray
Undefined System Error
Specific groupname=CUSTGR1
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck
我想要做的是使用上面列中的相应值创建新列,即IP地址和MAC地址。
这样期望的输出看起来像这样:
Additional Information |IP Address | MAC Address |
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn |192.168.1.1 |00:0a:95:9d:68:16|
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray|192.168.0.1 |00:0a:95:9d:68:17|
Undefined System Error | | |
Specific groupname=CUSTGR1 | | |
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck |192.168.1.2 |00:1B:44:11:3A:B7|
问题是,我无法处理不包含IP和MAC的行。我尝试使用np.where进行拆分以及找到部分匹配项,但没有成功。
答案 0 :(得分:3)
想法是使用列表理解,如果没有缺失值或无,并且存在,
和=
,并进行过滤,则传递给DataFrame
构造函数,最后使用DataFrame.join
来原始: / p>
L = [dict(y.split("=") for y in v.split(", "))
if pd.notna(v) and ('=' in v) and (', ' in v)
else {}
for v in df['Additional Information']]
df1 = pd.DataFrame(L, index=df.index)
print (df1)
IP MAC ADDR USER
0 192.168.1.1 00:0a:95:9d:68:16 kwfinn
1 192.168.0.1 00:0a:95:9d:68:17 wattray
2 NaN NaN NaN
3 NaN NaN NaN
4 192.168.1.2 00:1B:44:11:3A:B7 stwnck
df = df.join(df1[['IP','MAC ADDR']])
print (df)
Additional Information IP \
0 IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, US... 192.168.1.1
1 IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, US... 192.168.0.1
2 Undefined System Error NaN
3 Specific groupname=CUSTGR1 NaN
4 IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, US... 192.168.1.2
MAC ADDR
0 00:0a:95:9d:68:16
1 00:0a:95:9d:68:17
2 NaN
3 NaN
4 00:1B:44:11:3A:B7