Question

我无法找到这个问题的确切答案，所以我希望我不会重复一个问题。

我有一个数据框如下

groupid  col1  col2  col3  col4
   1      0     n     NA     2    
   1      NA    NA    2      2

我想要传达的是，存在重复的ID，其中总信息分布在两行中，并且我想组合这些行以将所有信息合并到一行中。我该怎么做？

我尝试过使用group_by和paste但最终导致数据变得更乱（例如在col4中得到22而不是2）而sum（）不起作用，因为有些列是字符串而不是是分类变量，总结它们会改变信息。

我是否可以做些什么来折叠行并在填写NA时保持一致的数据不变？

编辑：

抱歉，所需的输出如下：

groupid  col1  col2  col3  col4
   1      0     n     2     2

Answer 1

这是你想要的吗？ zoo + dplyr也请检查link此处

df %>%
    group_by(groupid) %>%
    mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


# A tibble: 1 x 5
# Groups:   groupid [1]
  groupid  col1  col2  col3  col4
    <int> <int> <chr> <int> <int>
1       1     0     n     2     2

<强> EDIT1

没有过滤器，将返回整个数据帧。

    df %>%
        group_by(groupid) %>%
        mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))

# A tibble: 2 x 5
# Groups:   groupid [1]
  groupid  col1  col2  col3  col4
    <int> <int> <chr> <int> <int>
1       1     0     n    NA     2
2       1     0     n     2     2

filter在这里，只需对最后一个进行切片，na.locf将继续执行之前的NA值，这意味着您组中的最后一行是您想要的。

也基于推荐的@thelatemail。您可以执行以下操作，给出相同的答案。

df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))

<强> EDIT2

假设你有冲突，你想要全部展示。

df <- read.table(text="groupid  col1  col2  col3  col4
   1      0     n     NA     2    
                 1      1    NA    2      2",
                 header=TRUE,stringsAsFactors=FALSE)
 df
  groupid col1 col2 col3 col4
1       1    0    n   NA    2
2       1    1(#)<NA>    2    2(#)
df %>%
    group_by(groupid) %>%
    summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
  groupid  col1  col2  col3  col4
    <int> <chr> <chr> <chr> <chr>
1       1  0, 1     n     2   2

Answer 2

在这种情况下，您能够绘制所需的输出吗？将data.frame转换为anothre类型as.vector（），as.matrix（）和分组/分解可能会有所帮助。

更新：为每个列找到一个独特的元素并省略NA。

df<-data.frame(groupid=c(1,1), col1=c(0,NA), col2=c('n', NA), col3=c(NA,2),  col4=c(2,2)) # your input
out<-data.frame(df[1,]) # where the output is stored, duplicate retaining 1 row
for(i in 1:ncol(df)) out[,i]<-na.omit(unique(df[,i]))
print(out)

Answer 3

只有dplyr的另一个选项是在可用时获取第一个非NA值。你可以做

dd <- read.table(text="groupid  col1  col2  col3  col4
1      0     n     NA     2    
1      NA    NA    2      2", header=T)

dd %>% 
  group_by(groupid) %>% 
  summarise_all(~first(na.omit(.)))

按行组合，每行中有不同的NA

3 个答案: