合并(基础R) - 意外结果

时间:2016-04-13 18:40:16

标签: r join merge

我正在尝试使用R中的merge函数合并两个数据框。两个数据框正在按公共列advBucket合并,但只有advBucket中的第一个因素存在在新的数据框架中,我不明白。

library("dplyr")
dmB <- c(0.0, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 30.0, 100.0, 200.0)
ordersDM$advBucket <- cut(ordersDM$adv, breaks=dmB, include.lowest=TRUE, right=TRUE)
temp1 <- summarize( group_by(ordersDM, advBucket),
                         eISmedian = median(is),
                         eISmean = mean(is))
orders1 <- merge(ordersDM, temp1, by = "advBucket", all=TRUE)
  • ordersDM预计输出:所有14个存在桶:(0.0,0.5)(100.0,200.0)
  • temp1具有预期输出:14行x 3列。
  • 但是,
  • orders1ordersDM的行数相同,但是
    • orders1$adv的值只有0.00.5以及
    • orders1$advBucket只有值(0.0, 0.50)

我很困惑为什么merge没有按预期工作:所有advBuckets都应该出现在orders1中,orders1$adv不应该与ordersDM$adv不同{1}}。

str(ordersDM)
data.frame':    9343 obs. of  30 variables:
 $ desc     : Factor w/ 10770 levels "Order:864631",..: 2 11 12 15 18 19 ...
 $ adv      : num  97.51 8.08 44.25 38.25 35.48 ...
 $ advBucket: Factor w/ 14 levels "[0,0.5]","(0.5,1]",..: 13 9 13 13 13 6 ...

str(temp1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   14 obs. of  3 variables:
 $ advBucket: Factor w/ 14 levels "[0,0.5]","(0.5,1]",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ eISmedian: num  0 -0.161 -4.817 -4.478 -19.447 ...
 $ eISmean  : num  -2.28 -6.18 -10.84 -16.41 -30.9 ...
- attr(*, "drop")= logi TRUE

str(orders1)
data.frame':    9343 obs. of  32 variables:
 $ advBucket: Factor w/ 14 levels "[0,0.5]","(0.5,1]",..: 2 2 2 2 2 2 2 2 2 2 ..
 $ desc     : Factor w/ 10770 levels "Order:864631",..
 $ adv      : num  0.567 0.632 0.942 0.914 0.589 ...

意外结果:ordersDM $ advBucket仅包含(0.5,1)而ordersDM $ adv仅包含0.5到1.0之间的值

identical( levels(ordersDM$advBucket), levels( temp1$advBucket) )
[1] TRUE

dput(head(ordersDM$advBucket))
structure(c(13L, 9L, 13L, 13L, 13L, 6L), .Label = c("[0,0.5]", 
"(0.5,1]", "(1,2]", "(2,3]", "(3,4]", "(4,5]", "(5,6]", "(6,8]", 
"(8,10]", "(10,15]", "(15,20]", "(20,30]", "(30,100]", "(100,200]"
), class = "factor")

dput(head(temp1))
structure(list(advBucket = structure(1:6, .Label = c("[0,0.5]", 
"(0.5,1]", "(1,2]", "(2,3]", "(3,4]", "(4,5]", "(5,6]", "(6,8]", 
"(8,10]", "(10,15]", "(15,20]", "(20,30]", "(30,100]", "(100,200]"
), class = "factor"), eISmedian = c(0, -0.1612095, -4.8167, -4.478417, 
-19.447492, -20.224064), eISmean = c(-2.28172053945819, -6.18051401299694, 
-10.8404419365303, -16.4115004132139, -30.8983449604262, -31.3046641767241
)), .Names = c("advBucket", "eISmedian", "eISmean"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -6L))

根据“FascinatingFingers”的建议,以下工作符合预期:

orders1 <- full_join(ordersDM, temp1, by = "advBucket")

但我仍然认为,如果有人好奇,为什么合并{base}不起作用很有意思。

0 个答案:

没有答案