我正在尝试使用R中的merge
函数合并两个数据框。两个数据框正在按公共列advBucket
合并,但只有advBucket
中的第一个因素存在在新的数据框架中,我不明白。
library("dplyr")
dmB <- c(0.0, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 30.0, 100.0, 200.0)
ordersDM$advBucket <- cut(ordersDM$adv, breaks=dmB, include.lowest=TRUE, right=TRUE)
temp1 <- summarize( group_by(ordersDM, advBucket),
eISmedian = median(is),
eISmean = mean(is))
orders1 <- merge(ordersDM, temp1, by = "advBucket", all=TRUE)
ordersDM
预计输出:所有14个存在桶:(0.0,0.5)
到(100.0,200.0)
。 temp1
具有预期输出:14行x 3列。orders1
与ordersDM
的行数相同,但是
orders1$adv
的值只有0.0
和0.5
以及orders1$advBucket
只有值(0.0, 0.50)
。我很困惑为什么merge
没有按预期工作:所有advBuckets
都应该出现在orders1
中,orders1$adv
不应该与ordersDM$adv
不同{1}}。
str(ordersDM)
data.frame': 9343 obs. of 30 variables:
$ desc : Factor w/ 10770 levels "Order:864631",..: 2 11 12 15 18 19 ...
$ adv : num 97.51 8.08 44.25 38.25 35.48 ...
$ advBucket: Factor w/ 14 levels "[0,0.5]","(0.5,1]",..: 13 9 13 13 13 6 ...
str(temp1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 14 obs. of 3 variables:
$ advBucket: Factor w/ 14 levels "[0,0.5]","(0.5,1]",..: 1 2 3 4 5 6 7 8 9 10 ...
$ eISmedian: num 0 -0.161 -4.817 -4.478 -19.447 ...
$ eISmean : num -2.28 -6.18 -10.84 -16.41 -30.9 ...
- attr(*, "drop")= logi TRUE
str(orders1)
data.frame': 9343 obs. of 32 variables:
$ advBucket: Factor w/ 14 levels "[0,0.5]","(0.5,1]",..: 2 2 2 2 2 2 2 2 2 2 ..
$ desc : Factor w/ 10770 levels "Order:864631",..
$ adv : num 0.567 0.632 0.942 0.914 0.589 ...
意外结果:ordersDM $ advBucket仅包含(0.5,1)而ordersDM $ adv仅包含0.5到1.0之间的值
identical( levels(ordersDM$advBucket), levels( temp1$advBucket) )
[1] TRUE
dput(head(ordersDM$advBucket))
structure(c(13L, 9L, 13L, 13L, 13L, 6L), .Label = c("[0,0.5]",
"(0.5,1]", "(1,2]", "(2,3]", "(3,4]", "(4,5]", "(5,6]", "(6,8]",
"(8,10]", "(10,15]", "(15,20]", "(20,30]", "(30,100]", "(100,200]"
), class = "factor")
dput(head(temp1))
structure(list(advBucket = structure(1:6, .Label = c("[0,0.5]",
"(0.5,1]", "(1,2]", "(2,3]", "(3,4]", "(4,5]", "(5,6]", "(6,8]",
"(8,10]", "(10,15]", "(15,20]", "(20,30]", "(30,100]", "(100,200]"
), class = "factor"), eISmedian = c(0, -0.1612095, -4.8167, -4.478417,
-19.447492, -20.224064), eISmean = c(-2.28172053945819, -6.18051401299694,
-10.8404419365303, -16.4115004132139, -30.8983449604262, -31.3046641767241
)), .Names = c("advBucket", "eISmedian", "eISmean"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -6L))
根据“FascinatingFingers”的建议,以下工作符合预期:
orders1 <- full_join(ordersDM, temp1, by = "advBucket")
但我仍然认为,如果有人好奇,为什么合并{base}不起作用很有意思。