我有以下4列的数据框:
IP Time URL Staus
0 10.128.2.1 [29/Nov/2017:06:58:55 GET /login.php HTTP/1.1 200
1 10.128.2.1 [29/Nov/2017:06:59:02 POST /process.php HTTP/1.1 302
2 10.128.2.1 [29/Nov/2017:06:59:03 GET /home.php HTTP/1.1 200
3 10.131.2.1 [29/Nov/2017:06:59:04 GET /js/vendor/moment.min.js HTTP/1.1 200
4 10.130.2.1 [29/Nov/2017:06:59:06 GET /bootstrap-3.3.7/js/bootstrap.js HTTP/1.1 200
5 10.130.2.1 [29/Nov/2017:06:59:19 GET /profile.php?user=bala HTTP/1.1 200
我需要将“时间”列拆分为两个新列,分别为“日期”和“时间”。我需要用“:”的第一次出现来拆分“时间”列下的当前值。
我已经尝试了对“:”的第一个实例的split函数,如下所示:
df['date','time']=df.Time.str.split(":", 1)
但这就是我最终得到的:
IP Time URL Staus (date, time)
0 10.128.2.1 [29/Nov/2017:06:58:55 GET /login.php HTTP/1.1 200 [[29/Nov/2017, 06:58:55]
1 10.128.2.1 [29/Nov/2017:06:59:02 POST /process.php HTTP/1.1 302 [[29/Nov/2017, 06:59:02]
2 10.128.2.1 [29/Nov/2017:06:59:03 GET /home.php HTTP/1.1 200 [[29/Nov/2017, 06:59:03]
3 10.131.2.1 [29/Nov/2017:06:59:04 GET /js/vendor/moment.min.js HTTP/1.1 200 [[29/Nov/2017, 06:59:04]
如何正确分成两列?我究竟做错了什么?帮助:(
答案 0 :(得分:1)
为expand=True
添加参数DataFrame
,然后为新列添加[]
:
df[['date','time']] = df.Time.str.split(":", 1, expand=True)
print (df)
IP Time URL Staus \
0 10.128.2.1 [29/Nov/2017:06:58:55 GET/login.php HTTP/1.1 200
1 10.128.2.1 [29/Nov/2017:06:59:02 POST/process.php HTTP/1.1 302
date time
0 [29/Nov/2017 06:58:55
1 [29/Nov/2017 06:59:02
或者也添加Series.str.strip
来删除结尾的[]
:
df[['date','time']] = df.Time.str.strip('[]').str.split(":", 1, expand=True)
print (df)
IP Time URL Staus \
0 10.128.2.1 [29/Nov/2017:06:58:55 GET/login.php HTTP/1.1 200
1 10.128.2.1 [29/Nov/2017:06:59:02 POST/process.php HTTP/1.1 302
date time
0 29/Nov/2017 06:58:55
1 29/Nov/2017 06:59:02