我有一个ip数据包作为CSV文件,我试图将信息字段中的序列号提取到只有序列号的单独列。序列号是信息字符串中间的子字符串。所以这是我的原始代码..首先我创建一个新的列序列号,然后我检查信息字段是否包含Seq编号,然后我拆分信息字段,所以我只得到序列号。如果我在&seq = j.split ...'之后打印,我会得到正确的值。如何将其写入CSV文件Seq列?
file = pd.read_csv(file.csv)
file['Seq'] = None
for i in file['Info']:
if 'Seq' in i:
split = i.split(' ')
for j in split:
if 'Seq=' in j:
Seq = j.split('Seq=',1)[1]
file.loc[i,'Seq'] = int(Seq)
示例CSV:
No. Time Source Destination Protocol Length Info
1 0.000000 sourceip 192.168.0.1 TCP 54 35165 > 80 [SYN] Seq=0 Win=16384 Len=0
2 0.000001 sourceip 192.168.0.1 TCP 54 14378 > 80 [SYN] Seq=0 Win=16384 Len=0
3 0.000003 sourceip 192.168.0.1 TCP 54 31944 > 80 [SYN] Seq=0 Win=16384 Len=0
期望的结果:
No. Time Source Destination Protocol Length Info Seq
1 0.000000 sourceip 192.168.0.1 TCP 54 35165 > 80 [SYN] Seq=0 Win=16384 Len=0 0
2 0.000001 sourceip 192.168.0.1 TCP 54 14378 > 80 [SYN] Seq=0 Win=16384 Len=0 0
3 0.000003 sourceip 192.168.0.1 TCP 54 31944 > 80 [SYN] Seq=0 Win=16384 Len=0 0
答案 0 :(得分:2)
使用str.extract
file['Seq'] = file.Info.str.extract('Seq=(\d+)', expand=False).astype(float)