在R数据帧中格式化JSON对象

时间:2016-08-27 12:57:52

标签: arrays json r object

我想从JSON文件中的对象创建一个数据框,我已经读入了R.有一个对象包含一个数字数组。

{
"data": [{
    "csv": [

        [2840807, 1458, 2841695, 1453, 2843810, 1448, 2848148, 1451, 2849744, 1433, 2851124, 1429, 2852570, 1427, 2855094, 1438, 2856712, 1423, 2858076, 1517, 2861072, 1462, 2862928, 1436, 2867020, 1431, 2869478, 1427, 2875622, 1447, 2877576, 1477, 2879148, 1479, 2882996, 1376, 2885182, 1377, 2886852, 1353, 2890056, 1439, 2894000, 1337, 2896792, 850, 2903790, 1304, 2906928, 1194, 2908392, 1199, 2918376, 1331, 2921652, 1294, 2926084, 1289, 2929324, 1287, 2930040, 1261, 2934936, 1297, 2936552, 1277, 2942992, 1322, 2946452, 1317, 2949464, 1307, 2952680, 1305, 2956264, 1301, 2959132, 1299, 2961710, 1315, 2962590, 1323, 2964382, 1517, 2968378, 1983, 2971068, 1981, 2971654, 1979, 2971848, 1978, 2971996, 1977, 2972812, 1976, 2973374, 1975, 2974244, 1973],

        [2840807, 109824, 2841695, 126839, 2843810, 79656, 2845320, 109065, 2846148, 125106, 2848148, 154145, 2849744, 172562, 2851124, 188048, 2852570, 200180, 2855094, 75794, 2856712, 34674, 2858076, 45188, 2859206, 76179, 2861072, 69414, 2862928, 111601, 2865064, 133287, 2867020, 76194, 2869478, 120438, 2871360, 150805, 2875622, 176987, 2877576, 188887, 2879148, 71912, 2879976, 98267, 2882996, 150507, 2885182, 64488, 2886852, 80228, 2890056, 115601, 2892148, 67960, 2894000, 48487, 2896792, 48307, 2900768, 43033, 2901416, 31736, 2903790, 66720, 2906928, 55314, 2908392, 94788, 2911038, 31537, 2911532, 51875, 2911976, 67444, 2912556, 71559, 2918376, 102409, 2921652, 43392, 2924090, 82413, 2926084, 117744, 2927292, 68109, 2929324, 63155, 2930040, 78436, 2934936, 35685, 2936552, 38304, 2938900, 45302, 2942992, 36433, 2946452, 22651, 2949464, 48199, 2952680, 69146, 2956264, 77338, 2959132, 97334, 2961710, 32421, 2962590, 41347, 2964382, 76217, 2968378, 50101, 2971068, 107824, 2971466, 111102, 2971654, 110620, 2971848, 113277, 2971996, 44743, 2972812, 79831, 2972880, 79939, 2972958, 80009, 2973048, 80724, 2973126, 81660, 2973308, 86847, 2973374, 88831, 2973440, 90919, 2973508, 94734, 2974244, 25806, 2974314, 26648], null, null, null, null, null, null, null, [2840807, 5, 2846148, 6, 2858076, 5, 2867020, 4, 2871360, 6, 2875622, 8, 2879148, 9, 2890056, 10, 2894000, 11, 2900768, 12, 2903790, 10, 2908392, 11, 2911038, 10, 2921652, 11, 2924090, 12, 2927292, 11, 2934936, 12, 2942992, 11, 2949464, 10, 2961710, 9, 2962590, 8, 2964382, 6, 2971068, 7, 2971848, 8, 2973508, 9], null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null
    ]

}]

}

创建数据框将它们全部放在一个变量下:

 2840807  
 1458 

我想将它们设为2个变量,因为第一个数字是第二个数字的编码日期。所以想把它们分成如下表:

 2840807 1458  

我对R来说相当新,并且不确定最好的解决方法

statisticsObject <- lapply(response$data$csv, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x) 

1 个答案:

答案 0 :(得分:0)

我认为这是一种适用于您的数据的方法。

我创建了一个更简单的JSON代码段,它应该更易于阅读,但仍然可以捕获输入数据的主要功能。

<强> test.json

{
    "data": [{
        "csv": [
            [2840807, 1458, 2841695, 1453],
            [2840807, 109824, 2841695, 126839],
            null, null, null, null, null, null, null,
            [2840807, 5, 2846148, 6],
            null, null, null, null
        ]
    }]
}

R代码

# load JSON and convert to a single vector
library(jsonlite)
json <- fromJSON('/tmp/test.json')
dat <- unlist(json$data$csv)

# get even and odd values separately and combine into a data frame
indices <- seq(from=1, to=length(dat), by=2)
df <- data.frame(
         col1=dat[indices],
         col2=dat[indices + 1]
      )

<强>结果

> df
     col1   col2
1 2840807   1458
2 2841695   1453
3 2840807 109824
4 2841695 126839
5 2840807      5
6 2846148      6