我从数据库中导出了原始数据集,这是一个嵌套的json结构。因此,基于此,我想将这种数据集转换为pandas数据框,但是,我不知道如何将这种结构转换为所需的数据框。
我已经练习了一些有关使用json_normalize从kaggle处理嵌套json数据结构的教程,但就我而言,我无法将json结构实现到所需的数据帧中。
我想要实现的所需数据框就是这样
下面是我导出的嵌套json结构
[
{
"tripId" : "19062521016",
"driveDisstance" : 0,
"devicetime" : "2019-06-26 06:02:30",
"speeds" : "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,1,2,4,11,13,13,14,14,14,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"locations" : "[{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0}]",
"dangerousValues" : "[{\"quickTick\":0},{\"quicRatio\":1023},{\"quicCount\":0},{\"brakeTick\":0},{\"brakeRatio\":1023},{\"brakeCount\":0},{\"overSpeedRatio\":1023},{\"overSpeedCount\":0}]"
},
{
"tripId" : "19062521016",
"driveDisstance" : 0,
"devicetime" : "2019-06-26 06:03:30",
"speeds" : "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"locations" : "[{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0}]",
...,
]
我希望输出如下。
tripId | driveDisstance | devicetime | speeds | qucikTick | quickRatio | quickCount | brakeTick | brakeRatio | brakeCount | overSpeedRatio | overSpeedCount
作为列
然后一个tripId有60行,其中包括60个速度值,6个位置值和危险值。
答案 0 :(得分:0)
原始帖子中存在一些拼写错误以及随之而来的不一致之处,因此在此回复中,为了便于说明,我使用了数据定义的拼写。
您可以使用jq将编码后的字符串“解包”为JSON,例如使用此过滤器:
map( (.locations |= fromjson)
| if has("dangerousValues") then .dangerousValues |= fromjson else . end)
.[]
| .locations |= fromjson
| .dangerousValues |= if . == null then [] else fromjson end
| [.tripId,.driveDisstance,.devicetime,.speeds]
+ (.dangerousValues
| add
| [.quickTick, .quicRatio, .quicCount, .brakeTick, .brakeRatio, .brakeCount, .overSpeedRatio, .overSpeedCount] )
| @tsv
使用-r命令行选项和已发布的数据,上方的jq过滤器会产生:
19062521016 0 2019-06-26 06:02:30 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,1,2,4,11,13,13,14,14,14,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0 1023 0 0 1023 0 1023 0
19062521016 0 2019-06-26 06:03:30 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.[]
| .locations |= fromjson
| [.tripId,.driveDisstance,.devicetime,.speeds]
+ [.dangerousValues|fromjson[][]]?
| @tsv
答案 1 :(得分:0)
可以从一个数组数组构建一个熊猫DataFrame,每个内部数组都是一行。您的数据超出了json_normalize
的容量,因此,我将手动处理:
演示:
data = [
{
"tripId" : "19062521016",
"driveDisstance" : 0,
"devicetime" : "2019-06-26 06:02:30",
"speeds" : "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,1,2,4,11,13,13,14,14,14,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"locations" : "[{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0}]",
"dangerousValues" : "[{\"quickTick\":0},{\"quicRatio\":1023},{\"quicCount\":0},{\"brakeTick\":0},{\"brakeRatio\":1023},{\"brakeCount\":0},{\"overSpeedRatio\":1023},{\"overSpeedCount\":0}]"
},
{
"tripId" : "19062521016",
"driveDisstance" : 0,
"devicetime" : "2019-06-26 06:03:30",
"speeds" : "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"locations" : "[{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0},{\"latitude\":0,\"longitude\":0}]",
"dangerousValues" : "[{\"quickTick\":0},{\"quicRatio\":1023},{\"quicCount\":0},{\"brakeTick\":0},{\"brakeRatio\":1023},{\"brakeCount\":0},{\"overSpeedRatio\":1023},{\"overSpeedCount\":0}]"
}
]
# pre processing
for elt in data:
elt['dangerousValues'] = dict(((k,v) for item in
json.loads(elt['dangerousValues'])
for k,v in item.items()))
elt['speeds'] = elt['speeds'].split(',')
# build dataframe data
dfdata = [[elt["tripId"], elt["driveDisstance"],elt["devicetime"], speed,
elt["dangerousValues"]["quickTick"],
elt["dangerousValues"]["quicRatio"],
elt["dangerousValues"]["quicCount"],
elt["dangerousValues"]["brakeTick"],
elt["dangerousValues"]["brakeRatio"],
elt["dangerousValues"]["brakeCount"],
elt["dangerousValues"]["overSpeedRatio"],
elt["dangerousValues"]["overSpeedCount"]]
for elt in data for speed in elt["speeds"]]
# build dataframe
df = pd.DataFrame(dfdata, columns=["tripId","driveDisstance",
"devicetime","speed",
"quickTick","quicRatio","quicCount",
"brakeTick","brakeRatio","brakeCount",
"overSpeedRatio","overSpeedCount"])
我给:
tripId driveDisstance devicetime speed quickTick quicRatio quicCount brakeTick brakeRatio brakeCount overSpeedRatio overSpeedCount
0 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
1 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
2 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
3 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
4 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
5 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
6 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
7 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
8 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
9 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
10 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
11 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
12 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
13 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
14 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
15 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
16 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
17 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
18 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
19 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
20 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
21 19062521016 0 2019-06-26 06:02:30 1 0 1023 0 0 1023 0 1023 0
22 19062521016 0 2019-06-26 06:02:30 2 0 1023 0 0 1023 0 1023 0
23 19062521016 0 2019-06-26 06:02:30 1 0 1023 0 0 1023 0 1023 0
24 19062521016 0 2019-06-26 06:02:30 2 0 1023 0 0 1023 0 1023 0
25 19062521016 0 2019-06-26 06:02:30 4 0 1023 0 0 1023 0 1023 0
26 19062521016 0 2019-06-26 06:02:30 11 0 1023 0 0 1023 0 1023 0
27 19062521016 0 2019-06-26 06:02:30 13 0 1023 0 0 1023 0 1023 0
28 19062521016 0 2019-06-26 06:02:30 13 0 1023 0 0 1023 0 1023 0
29 19062521016 0 2019-06-26 06:02:30 14 0 1023 0 0 1023 0 1023 0
30 19062521016 0 2019-06-26 06:02:30 14 0 1023 0 0 1023 0 1023 0
31 19062521016 0 2019-06-26 06:02:30 14 0 1023 0 0 1023 0 1023 0
32 19062521016 0 2019-06-26 06:02:30 8 0 1023 0 0 1023 0 1023 0
33 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
34 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
35 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
36 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
37 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
38 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
39 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
40 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
41 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
42 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
43 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
44 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
45 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
46 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
47 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
48 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
49 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
50 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
51 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
52 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
53 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
54 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
55 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
56 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
57 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
58 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
59 19062521016 0 2019-06-26 06:02:30 0 0 1023 0 0 1023 0 1023 0
60 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
61 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
62 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
63 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
64 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
65 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
66 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
67 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
68 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
69 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
70 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
71 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
72 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
73 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
74 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
75 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
76 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
77 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
78 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
79 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
80 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
81 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
82 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
83 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
84 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
85 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
86 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
87 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
88 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
89 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
90 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
91 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
92 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
93 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
94 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
95 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
96 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
97 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
98 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
99 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
100 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
101 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
102 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
103 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
104 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
105 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
106 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
107 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
108 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
109 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
110 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
111 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
112 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
113 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
114 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
115 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
116 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
117 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
118 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0
119 19062521016 0 2019-06-26 06:03:30 0 0 1023 0 0 1023 0 1023 0