Question

我使用.json library(jsonlite)

导入了stream_in(file(".json"))个文件

但是，其中一列仍然显示为.json格式。我不确定如何继续从ID列中取消email和.json列。

  My example:

  date <- as.Date(as.character( c("2015-02-13",
                                    "2015-02-14",
                                    "2015-02-14")))
  ID <- c(1,2,3)
  name <- c("John","Michael","Thomas")
  drinks <- c("Beer","Coffee","Tee")
  consumed <- c(2,5,3)
  john<- "{\"employeID\":\"1\",\"other_details\":{\"email\":\"john@gmx.com\"},\"computer\":\"yes\"}"
  michael<- "{\"employeID\":\"2\",\"other_details\":{\"email\":\"michael@yahoo.com\"},\"computer\":\"yes\"}"
  thomas<- "{\"employeID\":\"3\",\"other_details\":{\"email\":\"thomas@gmail.com\"},\"computer\":\"yes\"}"
  json <- c(john,michael,thomas)
  df <- data.frame(date,ID,name,drinks,consumed,json)

data.frame看起来像这样：

我想获得以下格式：

         date ID    name   drinks    consumed    email       computer
#1 2015-02-13  1    John   Beer        2      john@gmx.com      yes
#2 2015-02-14  2 Michael Coffee        5 michael@yahoo.com       no
#3 2015-02-14  3  Thomas    Tee        3  thomas@gmail.com      yes

我尝试过的是首先在不同版本中再次使用library(jsonlite)，但它总是导致：

fromJSON(df$json[1])  

Error: Argument 'txt' must be a JSON string, URL or file.

如何正确提取这些字段？

Answer 1

df$json是因子向量，而fromJSON只接受JSON字符串，URL或文件。你可以尝试

fromJSON(as.character(df$json[1]))

或在您创建stringsAsFactor=FALSE时添加df。

你完成任务，你可以尝试：

library(tidyverse)

df %>% 
  filter(json != "{}") %>%   # Drop rows with json == "{}"
  rowwise() %>%
  do(data.frame(ID = .$ID, jsonlite::fromJSON(.$json), stringsAsFactors=FALSE)) %>% 
  merge(df %>% select(-json), by="ID", all.y=TRUE)

输出：

  ID employeID             email computer       date    name drinks consumed
1  1         1      john@gmx.com      yes 2015-02-13    John   Beer        2
2  2         2 michael@yahoo.com      yes 2015-02-14 Michael Coffee        5
3  3         3  thomas@gmail.com      yes 2015-02-14  Thomas    Tee        3

它可以处理"{}"列中json的案例。

df2 <- df %>% 
  rbind(data.frame(date="2015-02-14", ID=4, name="Kitman", 
                   drinks="Chocolate", consumed=1, json="{}"))

df2 %>% 
  filter(json != "{}") %>% 
  rowwise() %>%
  do(data.frame(ID = .$ID, jsonlite::fromJSON(.$json), stringsAsFactors=FALSE)) %>% 
  merge(df2 %>% select(-json), by="ID", all.y=TRUE)

输出：

  ID employeID             email computer       date    name    drinks consumed
1  1         1      john@gmx.com      yes 2015-02-13    John      Beer        2
2  2         2 michael@yahoo.com      yes 2015-02-14 Michael    Coffee        5
3  3         3  thomas@gmail.com      yes 2015-02-14  Thomas       Tee        3
4  4      <NA>              <NA>     <NA> 2015-02-14  Kitman Chocolate        1

<强>过时：

cbind(
  df %>% select(-json), 
  df$json %>% 
    map(~as.data.frame(jsonlite::fromJSON(.))) %>% 
    do.call("rbind", .)
)

输出：

        date ID    name drinks consumed employeID             email computer
1 2015-02-13  1    John   Beer        2         1      john@gmx.com      yes
2 2015-02-14  2 Michael Coffee        5         2 michael@yahoo.com      yes
3 2015-02-14  3  Thomas    Tee        3         3  thomas@gmail.com      yes

Answer 2

首先，试试：

ndjson::stream_in("filename.json")

ndjson包比jsonlite更快，并且是为了展平而设计的（它是非常特定于任务的，而不是像瑞士军刀一样非常有用{{1} } pkg）。

或者，我们可以一直保持整齐的习语：

jsonlite

而且，你得到你的角色列。

如何从具有.json格式的行中提取字符串？

2 个答案: