在数据框中使用字符串拆分列

时间:2018-03-18 18:00:28

标签: python regex pandas delimiter

我从csv文件导入了以下数据框:

ts  employee_id gps_lat gps_lng event_id    event_params    speed   status  serial_number
9/22/2016 13:53 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1100110 211
9/22/2016 13:53 1   34.97   -81.98  Left    {"type":"Left","maximumangle":-38.57,"duration":203}    0   1102110 212
9/22/2016 13:53 1   34.97   -81.98  Right   {"type":"Right","maximumangle":52.975,"duration":17}    0   1102130 250
9/22/2016 13:53 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1102130 249
9/22/2016 13:54 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1102140 280
9/22/2016 13:54 1   34.97   -81.98  Left    {"type":"Left","maximumangle":-10.866,"duration":40}    0   1102140 279

我需要将event_params列拆分为单独的列,其中包含标题类型,最大角度和持续时间,我需要摆脱花括号。简而言之,我需要以下输出。

ts  employee_id gps_lat gps_lng event_id    Type    maximumangle    duration    speed   status  serial_number
9/22/2016 13:53 1   34.97   -81.98  Down    Down    0   0   0   1100110 211
9/22/2016 13:53 1   34.97   -81.98  Left    Left    -38.57  203 0   1102110 212
9/22/2016 13:53 1   34.97   -81.98  Right   Right   52.975  17  0   1102130 250
9/22/2016 13:53 1   34.97   -81.98  Down    Down    0   0   0   1102130 249
9/22/2016 13:54 1   34.97   -81.98  Down    Down    0   0   0   1102140 280

#Code I am trying to use:

import re
parts = re.split('\df3|(?<!\d)[:.](?!\d)', df3)
parts

我试图通过首先拆分它来解决问题:分隔符,然后将最后一列拆分为},然后删除内容最大角度和持续时间的列。

我一直在尝试以下列方式使用re.split函数,但它返回错误

--expected string or bytes-like object

1 个答案:

答案 0 :(得分:1)

由于难以重现您正在处理的确切数据,此解决方案应该给您足够的提示:

# create minimal sample data
df1 = pd.DataFrame({'employee_id':[1,2,3,4,5,6], 'gps':[1,1,1,1,1,1], 'event_params' : 
['{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Left","maximumangle":-38.57,"duration":203}',   
'{"type":"Right","maximumangle":52.975,"duration":17}', 
'{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Left","maximumangle":-10.866,"duration":40}']})


# save event_params column to a new value while removing from df1
df2 = df1.pop('event_params')

# convert values to dictionary format using ast library
import ast
df2 = df2.apply(ast.literal_eval)

# convert dictionary to column format and add back to df1
df2 = pd.DataFrame(list(df2))
df1 = pd.concat([df1, df2], axis=1)

print(df1)

  employee_id   gps     duration    maximumangle    type
0           1     1            0           0.000   Down
1           2     1          203         -38.570    Left
2           3     1           17          52.975    Right
3           4     1            0           0.000    Down
4           5     1            0           0.000    Down
5           6     1           40         -10.866    Left

修改1:要以字典格式转换所有event_params:

df2 = df2.apply(lambda x: ast.literal_eval(x) if isinstance(x, dict) else x)