Question

我在处理正在使用的数据框中的行时遇到问题。

在我的数据框中，有一个列称为OfficialIndices，我希望将行分隔为该列。此列存储用作索引的数字列表，以指示哪些行具有相同的数据。例如：索引2：3表示行2：3具有相同的数据。

这是我正在使用的代码。

offices_list <- data_google$offices
offices_JSON <- toJSON(offices_list)
offices_from_JSON <-
  separate_rows(fromJSON(offices_JSON), officialIndices, convert = TRUE)

这是我的office_list框架的样子

这是我尝试分隔行后的样子

我的代码在索引为2：3时可以正常工作，因为它们之间的差为1。但是在7:10之类的索引上，它将行分隔为7和10，而不是执行7、8、9、10，这是我希望做到的方式。我如何获取我的代码来分隔像这样的行？

dput（head（offices_list））的输出

structure(list(position = c("President of the United States", 
"Vice-President of the United States", "United States Senate", 
"Governor", "Mayor", "Auditor"), divisionId = c("ocd-division/country:us", 
"ocd-division/country:us", "ocd-division/country:us/state:or", 
"ocd-division/country:us/state:or", "ocd-division/country:us/state:or/place:portland", 
"ocd-division/country:us/state:or/place:portland"), levels = list(
    "country", "country", "country", "administrativeArea1", NULL, 
    NULL), roles = list(c("headOfState", "headOfGovernment"), 
    "deputyHeadOfGovernment", "legislatorUpperBody", "headOfGovernment", 
    NULL, NULL), officialIndices = list(0L, 1L, 2:3, 4L, 5L, 
    6L)), row.names = c(NA, 6L), class = "data.frame")

Answer 1

这应该有效。我希望它也可以用于其他行，因为我在officialIndices中测试了大于两个的范围。

首先，我提取了开始行和结束行，并使用它们之间的差异来确定需要多少行。然后tidyr::uncount()将添加那么多副本。

library(dplyr); library(tidyr)
data_sep <- data %>%
  separate(officialIndices, into = c("start", "end"), sep = ":") %>%
  # Use 1 row, and more if "end" is defined and larger than "start"
  mutate(rows = 1 + if_else(is.na(end), 0, as.numeric(end) - as.numeric(start))) %>%
  uncount(rows)

为什么在数据框中分隔行有问题？

1 个答案: