Question

我正在尝试将以下多文档JSON文件转换为data.frame。

x = '[
  {"name": "Bob","groupIds": ["kwt6x61", "yiahf43"]},
  {"name": "Sally","groupIds": "yiahf43"}
]'

我几乎在那里使用

 y = x %>% gather_array() %>% 
  spread_values(
    name = jstring("name"),
    groupIds = jstring("groupIds")
  )
print(y)

返回：

document.id array.index  name                   groupIds
1           1           1   Bob list("kwt6x61", "yiahf43")
2           1           2 Sally                    yiahf43

有人可以帮助将groupsIds分散到附加行吗？

Answer 1

这是一个有趣的问题。问题源于1的数组作为字符串存储的事实。否则，enter_object('groupIds') %>% gather_array %>% append_values_string可以很好地工作。 tidyjson似乎没有很好地处理这种情况。我想知道这是否会被认为是有效的JSON，因为在一种情况下groupIds是一个字符串，而在另一种情况下它是一个数组。

在任何情况下，虽然这不是一个理想的解决方案，但您可以使用json_types()来说明差异，然后有条件地对待每个。在完成解析时，我转换为tbl_df（即丢弃的JSON组件）以供将来处理。

library(tidyjson)
library(dplyr)
library(tidyr)

x = '[
  {"name": "Bob","groupIds": ["kwt6x61", "yiahf43"]},
  {"name": "Sally","groupIds": "yiahf43"}
 ]'

## Show the different types
z <- x %>% gather_array() %>% spread_values(
  name=jstring('name')
) %>% enter_object('groupIds') %>% json_types()

## Conditionally treat each
final <- bind_rows(
  z[z$type=='array',] %>% gather_array('id') %>% append_values_string('groupId')
  , z[z$type=='string',] %>% append_values_string('groupId') %>% mutate(id=1)
) %>% tbl_df

## Spread them out, maybe?  Depends on what you're looking for
final %>% spread('id','groupId')

使用tidyjson麻烦传播价值观

1 个答案: