Question

所以我总共有3个cols

col a      col b       col c
 500         NA         hello
 500         8          NA

有什么方法可以将它们组合在一起，例如输出会是这个吗？

col_a      col_b       col_c
 500         8         hello

我尝试了什么：

dt%>%
group_by(col_a) %>%
summarise_each(funs(first(na.omit(.))))

但它不起作用！结果仍然保持不变：（

非常感谢任何帮助，谢谢！

编辑：根据用户的一个请求

这是我数据框的内部结构：）

'data.frame':   11599 obs. of  3 variables:
 $ col_a   : chr  "1" "1000" "10000" "10001" ...
 $ col_b   : chr  NA NA NA NA ...
 $ col_c   : chr  "tcpmux" "cadlock2" "ndmp" "scp-config" ...

并且万一你想知道，col B确实有值，而不仅仅是NA：P和col C有NA值，即使它们说有字符串在这里

编辑no2：根据用户请求，这是20个数据的结构。

structure(list(col_a = c("1", "1000", "10000", "10001", 
"10002", "10003", "10003", "10004", "10005", "10006", "10007", 
"10008", "10009", "10009 ", "10010", "10022 ", "10023", "10047 ", 
"10050", "10051"), 

col_b = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, "3", NA, "3", NA, "3", NA, NA),

col_c = c("tcpmux", "cadlock2", "ndmp", "scp-config", "documentum", "documentum_s", 
"documentum-s", "emcrmirccd", "emcrmird", "netapp-sync", "mvs-capacity", "octopus", 
"swdtp-sv", NA, "rxapi", NA, "cefd-vmp", NA, "zabbix-agent", "zabbix-trapper")),
.Names = c("col_a", "col_b", "col_c"), row.names = c(NA, 20L), class = "data.frame")

Answer 1

如果您想留在dplyr，可以使用：

library(dplyr)
res <- dt %>% group_by(col_a=as.numeric(col_a)) %>%
              summarise_all(function(x) {first(na.omit(x), default=NA_character_)})

使用您发布的数据，我们得到：

print(res)
### A tibble: 19 x 3
##    col_a col_b          col_c
##    <dbl> <chr>          <chr>
##1       1               tcpmux
##2    1000             cadlock2
##3   10000                 ndmp
##4   10001           scp-config
##5   10002           documentum
##6   10003         documentum_s
##7   10004           emcrmirccd
##8   10005             emcrmird
##9   10006          netapp-sync
##10  10007         mvs-capacity
##11  10008              octopus
##12  10009     3       swdtp-sv
##13  10010  <NA>          rxapi
##14  10022     3           <NA>
##15  10023  <NA>       cefd-vmp
##16  10047     3           <NA>
##17  10050  <NA>   zabbix-agent
##18  10051  <NA> zabbix-trapper

我们在summaries_all使用了一个组成first和na.omit的函数。由于所有列都是字符，因此我们为default=NA_character_指定first。

现在，我不知道为什么如果一个组的所有元素都是NA，结果会返回初始组的""（空字符串）和"<NA>"之后的NA有些组有一些非mutate数据。要解决此问题，您可以额外执行library(dplyr) res <- dt %>% group_by(col_a=as.numeric(col_a)) %>% summarise_all(function(x) {first(na.omit(x), default=NA_character_)}) %>% mutate_all(function(x) {ifelse(x=="",NA_character_,x)}) ### A tibble: 19 x 3 ## col_a col_b col_c ## <dbl> <chr> <chr> ##1 1 <NA> tcpmux ##2 1000 <NA> cadlock2 ##3 10000 <NA> ndmp ##4 10001 <NA> scp-config ##5 10002 <NA> documentum ##6 10003 <NA> documentum_s ##7 10004 <NA> emcrmirccd ##8 10005 <NA> emcrmird ##9 10006 <NA> netapp-sync ##10 10007 <NA> mvs-capacity ##11 10008 <NA> octopus ##12 10009 3 swdtp-sv ##13 10010 <NA> rxapi ##14 10022 3 <NA> ##15 10023 <NA> cefd-vmp ##16 10047 3 <NA> ##17 10050 <NA> zabbix-agent ##18 10051 <NA> zabbix-trapper：

hello 55 friends -> hello 551 friend
i am 29 happy -> i am 291 happy

Answer 2

使用dplyr我只使用max函数。

library(dplyr)

df <- data.frame(cola=c(500,500), colb=c(NA,8), colc=c("hello",NA),stringsAsFactors=F)

df %>% group_by(cola) %>% summarise_all(max, na.rm=T)

给出

# A tibble: 1 × 3
cola  colb  colc
<dbl> <dbl> <chr>
  1   500     8 hello

Answer 3

尝试（不使用plyr包）：

df <- data.frame(cola=c(500,500), colb=c(NA,8), colc=c("hello",NA),stringsAsFactors=F)
aggregate(df[,c(2,3)], by=list(cola=df$cola), function(xx) xx[!is.na(xx)])

根据R中的另一个公共列合并2行值

3 个答案: