R将整齐的分层数据帧转换为分层列表

时间:2018-03-17 21:11:01

标签: r list hierarchical-data

转换此

g1    g2    desc    val
A     a     1       v1
A     a     2       v2
A     b     3       v3

要:

desc    val
A
a
1       v1
2       v2
b
3       v3

我已经使用for循环将具有两个分组级别的分层数据帧转换为结构化列表。这显示了描述,其中列表中的关联变量按顺序散布了组级别。

目的是将分层数据显示为列表,以便使用openxlsx以格式打印以区分不同的分组级别。

是否有更有效的基础R,tidyverse或其他方法来实现这一目标?

对于循环代码

tib <-  tibble(g1 = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
          g2 = c("a", "a", "b", "b", "b", "c", "d", "d", "b", "b", "e", "e"),
          desc = 1:12,
          val = paste0("v", 1:12))

# Number of rows in final table
n_rows <- length(unique(tib$g1)) + length(unique(paste0(tib$g1, tib$g2))) + nrow(tib)

# create empty output tibble
output <- 
    as_tibble(matrix(nrow = n_rows, ncol = 2)) %>% 
    rename(desc = V1, val = V2) %>% 
    mutate(desc = NA_character_,
           val = NA_real_)

# loop counters
level_1 <- 0
level_2 <- 0
output_row <- 1

for(i in seq_len(nrow(tib))){

  # level 1 headings
  if(tib$g1[[i]] != level_1) {
    output$desc[[output_row]] <- tib$g1[[i]]
    output_row <- output_row + 1
    }

  # level 2 headings
  if(paste0(tib$g1[[i]], tib$g2[[i]]) != paste0(level_1, level_2)) {
    output$desc[[output_row]] <- tib$g2[[i]]
    output_row <- output_row + 1
  }

  level_1 <- tib$g1[[i]]
  level_2 <- tib$g2[[i]]

  # Description and data
  output$desc[[output_row]] <- tib$desc[[i]]
  output$val[[output_row]] <- tib$val[[i]]
  output_row <- output_row + 1

}

2 个答案:

答案 0 :(得分:0)

我相信您可以简化并稍微优化您的代码:

PIP_INDEX_URL

这给了我以下输出:

~/.pip/pip.conf

答案 1 :(得分:0)

使用option allow_alias = true;中的一些软件包,我们可以:

tidyverse

返回:

library(tidyverse)

# or explicitly load what you need
library(purrr)
library(dplyr)
library(tidyr)
library(stringr)

transpose(df) %>% 
  unlist() %>% 
  stack() %>% 
  distinct(values, ind) %>% 
  mutate(detect_var = str_detect(values, "^v"),
         ind = lead(case_when(detect_var == TRUE ~ values)),
         values = case_when(detect_var == TRUE ~ NA_character_,
                            TRUE ~ values)) %>% 
  drop_na(values) %>% 
  select(values, ind) %>% 
  replace_na(list(ind = ""))

使用 values ind 1 A 2 a 3 1 v1 5 2 v2 7 b 8 3 v3 数据集,我的解决方案似乎比Plamen的慢一点:

tib