我从csv文件导入了以下数据框:
ts employee_id gps_lat gps_lng event_id event_params speed status serial_number
9/22/2016 13:53 1 34.97 -81.98 Down {"type":"Down","maximumangle":0,"duration":0} 0 1100110 211
9/22/2016 13:53 1 34.97 -81.98 Left {"type":"Left","maximumangle":-38.57,"duration":203} 0 1102110 212
9/22/2016 13:53 1 34.97 -81.98 Right {"type":"Right","maximumangle":52.975,"duration":17} 0 1102130 250
9/22/2016 13:53 1 34.97 -81.98 Down {"type":"Down","maximumangle":0,"duration":0} 0 1102130 249
9/22/2016 13:54 1 34.97 -81.98 Down {"type":"Down","maximumangle":0,"duration":0} 0 1102140 280
9/22/2016 13:54 1 34.97 -81.98 Left {"type":"Left","maximumangle":-10.866,"duration":40} 0 1102140 279
我需要将event_params列拆分为单独的列,其中包含标题类型,最大角度和持续时间,我需要摆脱花括号。简而言之,我需要以下输出。
ts employee_id gps_lat gps_lng event_id Type maximumangle duration speed status serial_number
9/22/2016 13:53 1 34.97 -81.98 Down Down 0 0 0 1100110 211
9/22/2016 13:53 1 34.97 -81.98 Left Left -38.57 203 0 1102110 212
9/22/2016 13:53 1 34.97 -81.98 Right Right 52.975 17 0 1102130 250
9/22/2016 13:53 1 34.97 -81.98 Down Down 0 0 0 1102130 249
9/22/2016 13:54 1 34.97 -81.98 Down Down 0 0 0 1102140 280
#Code I am trying to use:
import re
parts = re.split('\df3|(?<!\d)[:.](?!\d)', df3)
parts
我试图通过首先拆分它来解决问题:分隔符,然后将最后一列拆分为},然后删除内容最大角度和持续时间的列。
我一直在尝试以下列方式使用re.split函数,但它返回错误
--expected string or bytes-like object
答案 0 :(得分:1)
由于难以重现您正在处理的确切数据,此解决方案应该给您足够的提示:
# create minimal sample data
df1 = pd.DataFrame({'employee_id':[1,2,3,4,5,6], 'gps':[1,1,1,1,1,1], 'event_params' :
['{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Left","maximumangle":-38.57,"duration":203}',
'{"type":"Right","maximumangle":52.975,"duration":17}',
'{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Left","maximumangle":-10.866,"duration":40}']})
# save event_params column to a new value while removing from df1
df2 = df1.pop('event_params')
# convert values to dictionary format using ast library
import ast
df2 = df2.apply(ast.literal_eval)
# convert dictionary to column format and add back to df1
df2 = pd.DataFrame(list(df2))
df1 = pd.concat([df1, df2], axis=1)
print(df1)
employee_id gps duration maximumangle type
0 1 1 0 0.000 Down
1 2 1 203 -38.570 Left
2 3 1 17 52.975 Right
3 4 1 0 0.000 Down
4 5 1 0 0.000 Down
5 6 1 40 -10.866 Left
修改1:要以字典格式转换所有event_params:
df2 = df2.apply(lambda x: ast.literal_eval(x) if isinstance(x, dict) else x)