如何合并多个数据框并将结果转换为json

时间:2018-04-18 05:54:14

标签: r json

目标

合并多个数据框并将其放在json结构中。

简化的样本数据

我有三个数据框,每个数据框有~50k行和~7-10个列。

示例数据框如下:

df1 = data.frame('memberID'=c('001','002','002','003','003','003'),
                 'tripID'=c('111','122','123','314','315','316'),
                 'distance'=c(4.2,3.1,2.6,3.3,4.4,5.1), 
                 'duration'=c(1.1,2.3,4.6,3.2,1.1,9.7))

DF1:

memberID   tripID  distance  duration
   001      111      4.2       1.1
   002      122      3.1       2.3
   002      123      2.6       4.6
   003      314      3.3       3.2
   003      315      4.4       1.1
   003      316      5.1       9.7

(在df1中,每个memberID可能有多于1个tripID)

df2 = data.frame('tripID'=c('111','111','111','123','314','315','316'), 
             'eventID'=c(2,3,1,3,2,2,1), 
             'eventLat'=c(-10,-20,-30,-40,-50,-60,-70),
             'eventLon'=c(10,20,30,40,50,60,70),
             'speed'=c(15,25,35,45,55,65,75))

DF2

tripID eventID eventLat eventLon speed
 111       2      -10       10    15
 111       3      -20       20    25
 111       1      -30       30    35
 123       3      -40       40    45
 314       2      -50       50    55
 315       2      -60       60    65
 316       1      -70       70    75

(在df2中,每个tripID可能没有任何eventID,或者它也可能包含多个eventID)。

Ex:tripID:122没有任何eventID;因此,在df2中没有填充tripID 122。)

df3 = data.frame('tripID'=c('111','122','122','123','123','123','314','315','316'), 
'accuracy'=c(1,1,2,2,2,2,3,3,1),                 
'gpsLat'=c(-100,-200,-300,-400,-500,-400,-300,-200,-100),
'gpsLon'=c(100,200,300,400,500,400,300,200,100))

DF3:

tripID accuracy gpsLat gpsLon
 111        1   -100    100
 122        1   -200    200
 122        2   -300    300
 123        2   -400    400
 123        2   -500    500
 123        2   -400    400
 314        3   -300    300
 315        3   -200    200
 316        1   -100    100

(在df3中,每个tripID可能有多行数据)

json中的所需输出(对于演示,仅显示前三次行程):

[{"memberID":'001',
  "tripID":'111',
  "distance":4.2,
  "duration":1.1,
  "eventdetails":[
     {"eventID":2,
      "location":"-10,10",
      "speed":15},
     {"eventID":3,
      "location":"-20,20",
      "speed":25},
     {"eventID":1,
      "location":"-30,30",
      "speed":35}
  ],
  "gpspoint":[
     {"accuracy":1,
      "gpsposition":"-100,100"}
  ]
},
{"memberID":'002',
 "tripID":'122',
 "distance":3.1,
 "duration":2.3,
 "eventdetails":[
     {"eventID":NA,
      "location":NA,
      "speed":NA}
  ],
  "gpspoint":[
     {"accuracy":1,
      "gpsposition":"-200,200"},
     {"accuracy":2,
      "gpsposition":"-300,300"}
  ]
},
{"memberID":'002',
 "tripID":'123',
 "distance":2.6,
 "duration":4.6,
 "eventdetails":[
     {"eventID":3,
      "location":"-40,40",
      "speed":45}
   ],
 "gpspoint":[
     {"accuracy":2,
      "gpsposition":"-400,400"},
     {"accuracy":2,
      "gpsposition":"-500,500"},
     {"accuracy":2,
      "gpsposition":"-400,400"}
   ]
},
...]

Edit1:tripID是连接所有三个表的关键。

1 个答案:

答案 0 :(得分:2)

这个怎么样?我正在使用tripID作为key

合并三个数据帧
dats <- list(df1,df2,df3)
dfall <- Reduce(function(...) merge(..., by="tripID", all=TRUE), dats)

至JSON

 library(rjson)
 x <- toJSON(unname(split(dfall, 1:nrow(dfall))))
cat(x)