Question

所以这是我的挑战。我试图摆脱最好组织为列的数据行。原始数据集看起来像

1|1|a
2|3|b
2|5|c
1|4|d
1|2|e
10|10|f

最终结果是

1 |1,2,4 |a| e d
2 |3,5   |b| c
10|10    |f| NA

表的整形基于Col 1分组中的最小值Col 2，其中新的第3列是从组中的最小值定义的，而新的第4列是从非最小值折叠的。尝试的一些方法包括：

newTable[min(newTable[,(1%o%2)]),] ## returns the minimum of both COL 1 and 2 only

ddply(newTable,"V1", summarize, newCol = paste(V7,collapse = " ")) ## collapses all values by Col 1 and creates a new column nicely.

将这些代码行组合成一行的变化并没有奏效，部分原因是我的知识有限。这些修改不包括在内。

Answer 1

尝试：

 library(dplyr)
 library(tidyr)

 dat %>% 
     group_by(V1) %>% 
     summarise_each(funs(paste(sort(.), collapse=","))) %>%
     extract(V3, c("V3", "V4"), "(.),?(.*)")

给出输出

  #  V1    V2 V3  V4
  #1  1 1,2,4  a d,e
  #2  2   3,5  b   c
  #3 10    10  f

或使用aggregate和str_split_fixed

 res1 <- aggregate(.~ V1, data=dat, FUN=function(x) paste(sort(x), collapse=","))
 library(stringr)
 res1[, paste0("V", 3:4)] <- as.data.frame(str_split_fixed(res1$V3, ",", 2), 
                                              stringsAsFactors=FALSE)

如果您需要NA缺少值

  res1[res1==''] <- NA
  res1
  # V1    V2 V3   V4
 #1  1 1,2,4  a  d,e
 #2  2   3,5  b    c
 #3 10    10  f <NA>

数据

dat <- structure(list(V1 = c(1L, 2L, 2L, 1L, 1L, 10L), V2 = c(1L, 3L, 
5L, 4L, 2L, 10L), V3 = c("a", "b", "c", "d", "e", "f")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))

Answer 2

以下是使用data.table的方法，以及@ akrun发布的数据：

将列存储为list而不是将它们粘贴在一起可能很有用。

require(data.table) ## 1.9.2+
setDT(dat)[order(V1, V2), list(V2=list(V2), V3=V3[1L], V4=list(V3[-1L])), by=V1]
#    V1    V2 V3  V4
# 1:  1 1,2,4  a e,d
# 2:  2   3,5  b   c
# 3: 10    10  f

setDT(dat)通过引用将data.frame转换为data.table（不复制它）。然后，我们按照排序数据V1,V2列和V1列对其进行排序，对于每个组，我们创建列V2，V3和{{1如图所示。

V4和V2此处的类型为V4。如果您想要将所有条目粘贴在一起的字符列，只需将list替换为list(.)。

HTH

R：折叠行，然后将行转换为新列

2 个答案:

数据