Question

我可以通过创建数组，转置它们并将它们组合起来，或者通过base::reshape来手动完成。但是，我想通过跳入Tidyverse来移植到最终的真相，但现在我已经溺水了。

我有这样的数据：

我想有这个：

id A1 A2 A3 B1 B2 B3 B4 B5
1  2  3  5  3  4  5  6  7
2 ...

变量A和B中的实际值在上面是任意的，我的实际数据有超过10个A和B对，超过500 id s。显然，我正在将数据推送到“长”格式，但这对我的数据非常有意义。而且，一旦它们被安排为那样，将它们塑造成长格式应该不难，对吧？

任何惯用的tidyverse方式吗？如果我们可以在单个函数调用中展平整个事物（使用多个类似的A ish和B ish列），那将会很好。

Answer 1

以下是使用dplyr和tidyr中的函数的解决方案。 dt2是最终输出。

# Load package
library(dplyr)
library(tidyr)

# Create example data frame
dt <- read.table(text = "id A B
1  2 3
1  3 4
1  5 5
1  NA 6
1  NA 7",
                 header = TRUE, stringsAsFactors = FALSE)

# Process the data
dt2 <- dt %>%
  gather(Label, Value, -id) %>%
  drop_na(Value) %>%
  group_by(id, Label) %>%
  mutate(Label_Id = 1:n()) %>%
  unite(Col, Label, Label_Id, sep = "") %>%
  spread(Col, Value)

更新：创建一个功能来概括流程

根据评论，OP要求采用更“通用”的方法，我可能并不完全理解，但在这里我演示了如何将上述代码转换为函数并设计三个测试用例。函数flatten有一个参数，即输入tbl或data frame。输入tbl或data frame的列应为id, A, B, C, D ...。

# Load package
library(dplyr)
library(tidyr)

# Process the data
flatten <- function(dt){
  dt %>%
    gather(Label, Value, -id) %>%
    drop_na(Value) %>%
    group_by(id, Label) %>%
    mutate(Label_Id = 1:n()) %>%
    unite(Col, Label, Label_Id, sep = "") %>%
    spread(Col, Value)
}


### Test Case 1
test1 <- data_frame(id = rep(1, 5),
                    A = c(2, 3, 5, NA, NA),
                    B = 3:7)
test1_result <- flatten(test1)

### Test Case 2
test2 <- data_frame(id = c(rep(1, 5), rep(2, 8)),
                    A = c(2, 3, 5, NA, NA, 3, 4, 6, 8, 9, NA, 10, 12),
                    B = 3:15)
test2_result <- flatten(test2)

### Test Case 3
test3 <- data_frame(id = c(rep(1, 5), rep(2, 8)),
                    A = c(2, 3, 5, NA, NA, 3, 4, 6, 8, 9, NA, 10, 12),
                    B = 3:15,
                    C = c(rep(c(1, 2, 3, 4, 5), each = 2), NA, NA, NA),
                    D = seq(2, 26, 2))

test3_result <- flatten(test3)

在R

1 个答案:

更新：创建一个功能来概括流程