Question

我从某个网站上抓取了一些数据，但是它确实很简陋，出于某种原因，它几乎没有错误。因此，我将相同的数据抓取了3次，并生成了3个看起来像这样的表：

library(data.table)
df1 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(2, 1, 3, 4),
                  otherthing = c(2,1, 3, 4)
                  )

df2 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(1, 1, 1, 4),
                  otherthing = c(2,2, 3, 4)
)

df3 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(1, 1, 3, 4),
                  otherthing = c(2,1, 3, 3)
)

除了我还有更多列。我想将3个表组合在一起，并且当“事物”和“其他事物”等的值发生冲突时，我希望它选择至少具有2/3的值，并且如果存在则返回N / A没有2/3值。我相信“名称”和“ id”字段很好，它们是我想要合并的内容。

我正在考虑将表的名称分别设置为3个表中的“ thing1”，“ thing2”和“ thing3”，合并在一起，然后通过这些名称编写一些循环。有没有更优雅的解决方案？尽管我并不担心速度，但它需要为300多个值列工作。

在此示例中，我认为解决方案应该是：

final_result <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(1, 1, 3, 4),
                  otherthing = c(2,1, 3, 4)
)

Answer 1

要概括@IceCreamToucan中的方法，我们可以使用：

library(dplyr)

n_mode <- function(...) {
  x <- table(c(...))
  if(any(x > 1)) as.numeric(names(x)[which.max(x)])
  else NA
}

bind_rows(df1, df2, df3) %>%
  group_by(name, id) %>%
  summarise_all(funs(n_mode(.)))

N.B。请注意您的命名空间以及如何命名该函数...优先选择类似n_mode()的名称，以避免与base::mode发生冲突。最后，如果将其扩展到更多data.frames，则可能需要将它们放在列表中。如果不可能/不可行，则可以将bind_rows替换为purrr::map_df(ls(pattern = "^df[[:digit:]]+"), get)

Answer 2

Jason解决方案的数据表版本（您应该让他接受）

library(data.table)
n_mode <- function(x) {
  x <- table(x)
  if(any(x > 1)) as.numeric(names(x)[which.max(x)])
  else NA
}

my_list <- list(df1, df2, df3)

rbindlist(my_list)[, lapply(.SD, n_mode), .(name, id)]

#    name id thing otherthing
# 1: adam  1     1          2
# 2:  bob  2     1          1
# 3: carl  3     3          3
# 4:  dan  4     4          4

这是rbindlist的输出。希望这可以弄清楚为什么只取n_mode和name分组的所有列中的id就可以得到想要的输出。

rbindlist(my_list)[order(name, id)]

#     name id thing otherthing
#  1: adam  1     2          2
#  2: adam  1     1          2
#  3: adam  1     1          2
#  4:  bob  2     1          1
#  5:  bob  2     1          2
#  6:  bob  2     1          1
#  7: carl  3     3          3
#  8: carl  3     1          3
#  9: carl  3     3          3
# 10:  dan  4     4          4
# 11:  dan  4     4          4
# 12:  dan  4     4          3

在R中将同一表的3个版本组合在一起

2 个答案: