Sparklyr将列拆分为单独的行

时间:2019-05-08 11:05:18

标签: r apache-spark apache-spark-sql sparklyr

火花表有问题。我的桌子是;

[
{  
"year" : "2013",
"title" : "Turn It Down, Or Else!",
"info" : {
    "directors" : [
        "Alice Smith",
        "Bob Jones"
    ],
    "release_date" : "2013-01-18T00:00:00Z",
    "rating" : "6.2",
    "genres" : [
        "Comedy",
        "Drama"
    ],
    "image_url" : "http://ia.media-imdb.com/images/N/O9ERWAU7FS797AJ7LU8HN09AMUP908RLlo5JF90EWR7LJKQ7@@._V1_SX400_.jpg",
    "plot" : "A rock band plays their music at high volumes, annoying the neighbors.",
    "rank" : "11",
    "running_time_secs" : "5215",
    "actors" : [
        "David Matthewman",
        "Ann Thomas",
        "Jonathan G. Neff"
   ]
  }
 }
]

我需要将“数据”列分成几行。我需要这张桌子;

# Source: spark<?> [?? x 4]
   AssetConnectDeviceKey    CreateDate          FaultStatus Data  
 * <chr>                    <dttm>                    <int> <chr> 
 1 0037005B4834500C20323250 2019-03-19 11:02:52           1 F@BBZL,CSSAA
 2 0037005B4834500C20323250 2019-03-19 11:02:54           1 F@BBZL
 3 0037005B4834500C20323250 2019-03-19 11:02:54           1 F@BBZL
 4 0037005B4834500C20323250 2019-03-19 11:03:24           1 F@BBZL,QBBBC
# ... with more rows

我可以使用这样的数据帧来做到这一点;

# Source: spark<?> [?? x 4]
   AssetConnectDeviceKey    CreateDate          FaultStatus Data  
 * <chr>                    <dttm>                    <int> <chr> 
 1 0037005B4834500C20323250 2019-03-19 11:02:52           1 F@BBZL
 2 0037005B4834500C20323250 2019-03-19 11:02:52           1 CSSAA
 3 0037005B4834500C20323250 2019-03-19 11:02:54           1 F@BBZL
 4 0037005B4834500C20323250 2019-03-19 11:02:54           1 F@BBZL
 5 0037005B4834500C20323250 2019-03-19 11:03:24           1 F@BBZL
 6 0037005B4834500C20323250 2019-03-19 11:03:24           1 QBBBC
# ... with more rows

但是我无法在spark tbl上做到这一点。我该怎么办?

0 个答案:

没有答案