inner_join()
的函数dplyr
存在一个小问题。
我有两个表LatLong
和data_stops
。
LatLong是:
> str(LatLong) Classes ‘data.table’ and 'data.frame': 43456 obs. of 3 variables: $ Idindx : num 1 3 5 7 9 11 13 15 17 19 ... $ latMean :
num 54.8 54.8 54.8 54.8 54.8 ... $ longMean: num 11.1 11.1 11.1
11.1 11.1 ...
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "vars")=List of 1 ..$ : symbol Idindx
> dput(head(LatLong))
structure(list(Idindx = c(1, 3, 5, 7, 9, 11), latMean = c(54.831033947613,
54.8310100000107, 54.8309920000003, 54.8310145000011, 54.8310115000001,
54.831043), longMean = c(11.1227872540957, 11.1227459999747,
11.1227690000004, 11.1227944999961, 11.1228075000002, 11.1228525
)), .Names = c("Idindx", "latMean", "longMean"), class = "data.frame", row.names = c(NA,
-6L))
和data_stops是:
'data.frame': 2020 obs. of 7 variables:
$ Idindx : num 1 3 5 7 9 11 13 15 17 19 ...
$ minTime : POSIXct, format: "2008-06-01 00:07:16" "2008-06-01 08:44:42" "2008-06-01 08:50:18" "2008-06-01 08:56:45" ...
$ maxTime : POSIXct, format: "2008-06-01 08:40:25" "2008-06-01 08:46:33" "2008-06-01 08:52:43" "2008-06-01 08:58:44" ...
$ duration_minutes:Class 'difftime' atomic [1:2020] 513 2 2 2 1 1 3 3 6 7 ...
.. ..- attr(*, "units")= chr "mins"
$ Ship : num NA NA NA NA NA ...
$ latMean : num 54.8 54.8 54.8 54.8 54.8 ...
$ longMean : num 11.1 11.1 11.1 11.1 11.1 ...
> dput(head(data_stops))
structure(list(Idindx = c(1, 3, 5, 7, 9, 11), minTime = structure(c(1212268036,
1212299082, 1212299418, 1212299805, 1212300243, 1212300629), class = c("POSIXct",
"POSIXt")), maxTime = structure(c(1212298825, 1212299193, 1212299563,
1212299924, 1212300293, 1212300664), class = c("POSIXct", "POSIXt"
)), duration_minutes = structure(c(513, 2, 2, 2, 1, 1), units = "mins", class = "difftime"),
Ship = c(111111111, 111111111, 111111111, 111111111, 111111111,
111111111)), .Names = c("Idindx", "minTime", "maxTime", "duration_minutes",
"Ship"), class = "data.frame", row.names = c(NA, -6L))
当我尝试在Idindx之后合并它们时出现以下错误:
final_data<- inner_join(data_stops, LatLong)
Joining by: "Idindx"
Error in data.table::setkeyv(x, by$x) :
4 arguments passed to .Internal(nchar) which requires 3
这是我试过的:
最后用
final_data <- full_join(data_stops, LatLong, by="Idindx" )
有效!
我想了解为什么inner_join()
无法在这里工作!
谢谢!
答案 0 :(得分:0)
使用您发布的数据(dput()
)我使用dplyr::inner_join()
加入数据时没有问题:
data_stops <- structure(list(Idindx = c(1, 3, 5, 7, 9, 11), minTime = structure(c(1212268036,
1212299082, 1212299418, 1212299805, 1212300243, 1212300629), class = c("POSIXct",
"POSIXt")), maxTime = structure(c(1212298825, 1212299193, 1212299563,
1212299924, 1212300293, 1212300664), class = c("POSIXct", "POSIXt"
)), duration_minutes = structure(c(513, 2, 2, 2, 1, 1), units = "mins", class = "difftime"),
Ship = c(111111111, 111111111, 111111111, 111111111, 111111111,
111111111)), .Names = c("Idindx", "minTime", "maxTime", "duration_minutes",
"Ship"), class = "data.frame", row.names = c(NA, -6L))
LatLong <- structure(list(Idindx = c(1, 3, 5, 7, 9, 11), latMean = c(54.831033947613,
54.8310100000107, 54.8309920000003, 54.8310145000011, 54.8310115000001,
54.831043), longMean = c(11.1227872540957, 11.1227459999747,
11.1227690000004, 11.1227944999961, 11.1228075000002, 11.1228525
)), .Names = c("Idindx", "latMean", "longMean"), class = "data.frame", row.names = c(NA,
-6L))
require("dplyr")
final_data <- inner_join(data_stops, LatLong, by = "Idindx")
head(final_data)
# Idindx minTime maxTime duration_minutes Ship
# 1 1 2008-05-31 22:07:16 2008-06-01 06:40:25 513 mins 111111111
# 2 3 2008-06-01 06:44:42 2008-06-01 06:46:33 2 mins 111111111
# 3 5 2008-06-01 06:50:18 2008-06-01 06:52:43 2 mins 111111111
# 4 7 2008-06-01 06:56:45 2008-06-01 06:58:44 2 mins 111111111
# 5 9 2008-06-01 07:04:03 2008-06-01 07:04:53 1 mins 111111111
# 6 11 2008-06-01 07:10:29 2008-06-01 07:11:04 1 mins 111111111
# (Truncated)
在没有指定密钥的情况下使用inner_join()
我没有任何问题。我怀疑你的环境中的某些东西已被修改,当使用干净的数据时,它按预期工作。
顺便说一句,我避免在R(或任何编程语言)中使用不同的符号来命名对象:将camel case(LatLong
)与使用下划线(data_stops
)混合使用。我使用其中一个。我更喜欢下划线,因为我觉得它更容易扫描,但它并不重要。