合并多个数据框并将其放在json结构中。
我有三个数据框,每个数据框有~50k行和~7-10个列。
示例数据框如下:
df1 = data.frame('memberID'=c('001','002','002','003','003','003'),
'tripID'=c('111','122','123','314','315','316'),
'distance'=c(4.2,3.1,2.6,3.3,4.4,5.1),
'duration'=c(1.1,2.3,4.6,3.2,1.1,9.7))
DF1:
memberID tripID distance duration
001 111 4.2 1.1
002 122 3.1 2.3
002 123 2.6 4.6
003 314 3.3 3.2
003 315 4.4 1.1
003 316 5.1 9.7
(在df1中,每个memberID可能有多于1个tripID)
df2 = data.frame('tripID'=c('111','111','111','123','314','315','316'),
'eventID'=c(2,3,1,3,2,2,1),
'eventLat'=c(-10,-20,-30,-40,-50,-60,-70),
'eventLon'=c(10,20,30,40,50,60,70),
'speed'=c(15,25,35,45,55,65,75))
DF2
tripID eventID eventLat eventLon speed
111 2 -10 10 15
111 3 -20 20 25
111 1 -30 30 35
123 3 -40 40 45
314 2 -50 50 55
315 2 -60 60 65
316 1 -70 70 75
(在df2中,每个tripID可能没有任何eventID,或者它也可能包含多个eventID)。
Ex:tripID:122没有任何eventID;因此,在df2中没有填充tripID 122。)
df3 = data.frame('tripID'=c('111','122','122','123','123','123','314','315','316'),
'accuracy'=c(1,1,2,2,2,2,3,3,1),
'gpsLat'=c(-100,-200,-300,-400,-500,-400,-300,-200,-100),
'gpsLon'=c(100,200,300,400,500,400,300,200,100))
DF3:
tripID accuracy gpsLat gpsLon
111 1 -100 100
122 1 -200 200
122 2 -300 300
123 2 -400 400
123 2 -500 500
123 2 -400 400
314 3 -300 300
315 3 -200 200
316 1 -100 100
(在df3中,每个tripID可能有多行数据)
[{"memberID":'001',
"tripID":'111',
"distance":4.2,
"duration":1.1,
"eventdetails":[
{"eventID":2,
"location":"-10,10",
"speed":15},
{"eventID":3,
"location":"-20,20",
"speed":25},
{"eventID":1,
"location":"-30,30",
"speed":35}
],
"gpspoint":[
{"accuracy":1,
"gpsposition":"-100,100"}
]
},
{"memberID":'002',
"tripID":'122',
"distance":3.1,
"duration":2.3,
"eventdetails":[
{"eventID":NA,
"location":NA,
"speed":NA}
],
"gpspoint":[
{"accuracy":1,
"gpsposition":"-200,200"},
{"accuracy":2,
"gpsposition":"-300,300"}
]
},
{"memberID":'002',
"tripID":'123',
"distance":2.6,
"duration":4.6,
"eventdetails":[
{"eventID":3,
"location":"-40,40",
"speed":45}
],
"gpspoint":[
{"accuracy":2,
"gpsposition":"-400,400"},
{"accuracy":2,
"gpsposition":"-500,500"},
{"accuracy":2,
"gpsposition":"-400,400"}
]
},
...]
Edit1:tripID是连接所有三个表的关键。
答案 0 :(得分:2)
这个怎么样?我正在使用tripID
作为key
dats <- list(df1,df2,df3)
dfall <- Reduce(function(...) merge(..., by="tripID", all=TRUE), dats)
至JSON
library(rjson)
x <- toJSON(unname(split(dfall, 1:nrow(dfall))))
cat(x)