使用`purrr`

时间:2017-10-24 18:42:13

标签: r dataframe purrr

这是类似主题的第三个问题(从列表列表中提取子集到data.frame) - 我继续了解更多,但在问题略有变化时仍然会遇到障碍。

前两个相关问题: Extracting data from hierarchical lists of different lengths into `data.frame` using `purr` Extracting data from a list of lists into its own `data.frame` with `purrr`

这是类似口味的第三种 -

样本数据(代表性的清单列表):

q <- list(structure(list(a = -1.54676469632688, b = "s", c = "T", 
d = structure(list(id = 5L, label = "Utah", link = "Asia/Anadyr", 
    score = -0.21104594634643), .Names = c("id", "label", "link", "score")), sentiment = list(structure(list(text = structure(list(content = "the normal flow of supply chain activities is interrupted,", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "companies may experience financial loss, cost increases,", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "market share declines, customer defection and damage to", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")))), .Names = c("a", "b", "c", "d", "sentiment")), structure(list(a = 7.74576236632992, b = "z", c = "F", d = structure(list(id = 3L, label = "South Carolina", link = "Pacific/Wallis", score = 2.44729194863711), .Names = c("id", "label", "link", "score")), sentiment = list(structure(list(text = structure(list(content = "impacted companies by seven percent, on average.", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "today’s shortened product lifecycles, more demanding", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "into global markets, mean this approach is no longer", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(
    text = structure(list(content = "and down rapidly as market conditions change.", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "flexible supply chain allows them to both reduce risk and", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.5, score = 0.5), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")))), .Names = c("a", "b", "c", "d", "sentiment")))

我有一个很大的列表列表,由JSON提取提供。我试图将各种感兴趣的子列表提取到他们自己的表中(data.framedata.table

> q %>% map(names)
[[1]]
[1] "a"         "b"         "c"         "d"         "sentiment"
[[2]]
[1] "a"         "b"         "c"         "d"         "sentiment"

在这种情况下我想要:
- 每个("sentiment"q[[1]][[5]]等的第5个元素(q[[2]][[5]]
- 以及一些识别变量("a""b")来自第一个元素(q[[1]][[1]]q[[1]][[2]]等)

第5个元素的长度变化,但始终> 1,而ID变量的长度(即ab)始终为1。

我从前两个问题中了解到,最好通过从最嵌套的元素开始,然后在外面工作&#39;来完成这些类型的任务。必要时使用回收元件(例如使用data.frame)。 我遇到的问题是将第5个元素中的内容组织成所需的格式,这就是我正在做的事情:

> DF <- q %>% 
        map(`[`, c("a", "b", "sentiment")) %>% 
        map(modify_at, "sentiment", bind_rows) %>% 
        map_df(data.frame, stringsAsFactors = F)

当我bind_rows子列表的第一"sentiment"时,对于每个元素,我得到两行两个变量,而不是一行四个变量:

head(DF, 2)
   a        b                                             sentiment.text sentiment.sentiment
1 -1.546765 s the normal flow of supply chain activities is interrupted,                 0.3 
2 -1.546765 s                                                         -1                -0.3

我理解这是凭借"sentiment"的结构,但我不确定如何更深入地了解"text""sentiment"个对象,每个对象都有两个元素{{分别为1}}和"content", "beginOffset"

所需的输出而不是"magnitude", "score"显示的输出将是:

head(DF, 2)

1 个答案:

答案 0 :(得分:0)

这样的东西?

DF <- q %>% 
  map(`[`, c("a", "b", "sentiment")) %>% 
  map(.%>% modify_at("sentiment",. %>% map(as.data.frame,stringsAsFactors=FALSE) %>%bind_rows)) %>% 
  map_df(data.frame, stringsAsFactors = F)

#           a b                                     sentiment.text.content sentiment.text.beginOffset sentiment.sentiment.magnitude sentiment.sentiment.score
# 1 -1.546765 s the normal flow of supply chain activities is interrupted,                         -1                           0.3                      -0.3
# 2 -1.546765 s   companies may experience financial loss, cost increases,                         -1                           0.0                       0.0
# 3 -1.546765 s    market share declines, customer defection and damage to                         -1                           0.3                      -0.3
# 4  7.745762 z           impacted companies by seven percent, on average.                         -1                           0.3                      -0.3
# 5  7.745762 z       today’s shortened product lifecycles, more demanding                         -1                           0.0                       0.0
# 6  7.745762 z       into global markets, mean this approach is no longer                         -1                           0.3                      -0.3
# 7  7.745762 z              and down rapidly as market conditions change.                         -1                           0.0                       0.0
# 8  7.745762 z  flexible supply chain allows them to both reduce risk and                         -1                           0.5                       0.5

str(DF)
# 'data.frame': 8 obs. of  6 variables:
# $ a                            : num  -1.55 -1.55 -1.55 7.75 7.75 ...
# $ b                            : chr  "s" "s" "s" "z" ...
# $ sentiment.text.content       : chr  "the normal flow of supply chain activities is interrupted," "companies may experience financial loss, cost increases," "market share declines, customer defection and damage to" "impacted companies by seven percent, on average." ...
# $ sentiment.text.beginOffset   : num  -1 -1 -1 -1 -1 -1 -1 -1
# $ sentiment.sentiment.magnitude: num  0.3 0 0.3 0.3 0 ...
# $ sentiment.sentiment.score    : num  -0.3 0 -0.3 -0.3 0 ...