Question

我正在合并两个data.frames dat1和dat2，temp并且合并未提供dat2的所有值。为什么来自dat2的值无法正确合并？

示例数据

dat1 <- data.frame(temp = seq(0, 33.2, 0.1))

dat2 <- structure(list(temp = c(6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7, 
7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8, 8.1, 8.2, 8.3, 
8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 
9.7, 9.8, 9.9, 10, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 
10.8, 10.9, 11, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 
11.9, 12, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 
13, 13.1, 13.2), pprox = c(193.53, 626.8, 1055.04, 1478.24, 
1896.41, 2309.55, 2717.64, 3120.69, 3518.7, 3911.66, 4299.58, 
4682.45, 5060.26, 5433.03, 5800.74, 6163.39, 6520.99, 6873.53, 
7221.01, 7563.43, 7900.78, 8233.07, 8560.3, 8882.46, 9199.56, 
9511.59, 9818.55, 10120.44, 10417.27, 10709.03, 10995.71, 11277.33, 
11553.88, 11825.36, 12091.78, 12353.13, 12609.41, 12860.63, 13106.78, 
13347.87, 13583.89, 13814.86, 14040.76, 14261.61, 14477.41, 14688.14, 
14893.83, 15094.47, 15290.05, 15480.59, 15666.09, 15846.55, 16021.96, 
16192.34, 16357.68, 16517.98, 16673.26, 16823.51, 16968.73, 17108.93, 
17244.1, 17374.25, 17499.38, 17619.5, 17734.6, 17844.68, 17949.76, 
18049.82, 18144.87, 18234.91)), row.names = c(NA, 70L), class = "data.frame")

合并

dat <- left_join(dat1, dat2, by = "temp")

输出

dat[65:70, ]

   temp approx
65  6.4      626.80
66  6.5     1055.04
67  6.6          NA
68  6.7     1896.41
69  6.8          NA
70  6.9     2717.64

Answer 1

有趣的是identical(dat2$temp[4],6.6 )会返回TRUE，但identical(dat1$temp[67],6.6)会返回FALSE。

浮点问题是一个已知问题，请查看许多其他类似帖子中的Why are these numbers not equal?或floating point issue in R?。

如果设置dat1 <- data.frame(temp = round(seq(0, 33.2, 0.1), 2))，则应解决此问题。可能会将?all.equal作为all.equal(dat1$temp[67],6.6 )结帐是TRUE

Answer 2

我将两个数据框中的temp列转换为一个因子，然后将它们连接在一起。它有效！

dat1$temp <- as.factor(dat1$temp)
dat2$temp <- as.factor(dat2$temp)

dat <- left_join(dat1, dat2, by = "temp")

left_join不合并所有值

2 个答案: