不同长度数据帧的聚合因子

时间:2016-07-24 18:01:51

标签: r dataframe aggregate find-occurrences

我有各种数据框,如:

Var1 "Bananas" "Apples" "Oranges" 
Freq    "2"      "2"       "1"              


Var2 "Bananas" "Carrots" "Strawberries" "Apples"
Freq    "3"       "2"        "3"          "4"              

并且作为输出我喜欢一个数据帧/表/类似的东西给出每个输入数据帧的出现,包括在一个很好的概述中的0次出现。如下所示:

Var     "Bananas" "Apples" "Oranges" "Carrots" "Strawberries"
Sample1   "2"        "2"      "1"       "0"         "0"
Sample2   "3"        "4"      "0"       "2"         "3"

我无法找到任何解决方案,特别是因为data.frames不允许使用不同的长度。

2 个答案:

答案 0 :(得分:1)

请注意NA0意味着完全不同的事情。查看?dplyr::join

的帮助文件
library(dplyr)
df1 <- data.frame(Var1 =c("Bananas", "Apples", "Oranges"), 
           Freq =c(2,2,1))
df2 <- data.frame(Var1 =c("Bananas", "Carrots",
                          "Strawberries", "Apples"), 
                  Freq =c(3,2,3,4))
full_join(df1,df2, by = "Var1")

答案 1 :(得分:0)

你应该看看?merge

set.seed(1234)
dat1 <- data.frame(var1 = LETTERS[1:5], freq = sample(1:100, 5))
dat2 <- data.frame(var2 = LETTERS[3:7], freq = sample(1:100, 5))

res <- merge(dat1, dat2, by.x = "var1", by.y = "var2", all = TRUE)
res[is.na(res)] <- 0
res
#   var1 freq.x freq.y
# 1    A     12      0
# 2    B     62      0
# 3    C     60     65
# 4    D     61      1
# 5    E     83     23
# 6    F      0    100
# 7    G      0     50