将表与dplyr合并时出错

时间:2016-05-04 07:30:07

标签: r merge dplyr

inner_join()的函数dplyr存在一个小问题。 我有两个表LatLongdata_stops

LatLong是:

>  str(LatLong) Classes ‘data.table’ and 'data.frame':  43456 obs. of  3 variables:  $ Idindx  : num  1 3 5 7 9 11 13 15 17 19 ...  $ latMean :
 num  54.8 54.8 54.8 54.8 54.8 ...  $ longMean: num  11.1 11.1 11.1
 11.1 11.1 ...
  - attr(*, ".internal.selfref")=<externalptr> 
  - attr(*, "vars")=List of 1   ..$ : symbol Idindx

> dput(head(LatLong))
structure(list(Idindx = c(1, 3, 5, 7, 9, 11), latMean = c(54.831033947613, 
54.8310100000107, 54.8309920000003, 54.8310145000011, 54.8310115000001, 
54.831043), longMean = c(11.1227872540957, 11.1227459999747, 
11.1227690000004, 11.1227944999961, 11.1228075000002, 11.1228525
)), .Names = c("Idindx", "latMean", "longMean"), class = "data.frame", row.names = c(NA, 
-6L))

和data_stops是:

'data.frame':   2020 obs. of  7 variables:
 $ Idindx          : num  1 3 5 7 9 11 13 15 17 19 ...
 $ minTime         : POSIXct, format: "2008-06-01 00:07:16" "2008-06-01 08:44:42" "2008-06-01 08:50:18" "2008-06-01 08:56:45" ...
 $ maxTime         : POSIXct, format: "2008-06-01 08:40:25" "2008-06-01 08:46:33" "2008-06-01 08:52:43" "2008-06-01 08:58:44" ...
 $ duration_minutes:Class 'difftime'  atomic [1:2020] 513 2 2 2 1 1 3 3 6 7 ...
  .. ..- attr(*, "units")= chr "mins"
 $ Ship            : num  NA NA NA  NA  NA   ...
 $ latMean         : num  54.8 54.8 54.8 54.8 54.8 ...
 $ longMean        : num  11.1 11.1 11.1 11.1 11.1 ...



 > dput(head(data_stops))
structure(list(Idindx = c(1, 3, 5, 7, 9, 11), minTime = structure(c(1212268036, 
1212299082, 1212299418, 1212299805, 1212300243, 1212300629), class = c("POSIXct", 
"POSIXt")), maxTime = structure(c(1212298825, 1212299193, 1212299563, 
1212299924, 1212300293, 1212300664), class = c("POSIXct", "POSIXt"
)), duration_minutes = structure(c(513, 2, 2, 2, 1, 1), units = "mins", class = "difftime"), 
    Ship = c(111111111, 111111111, 111111111, 111111111, 111111111, 
    111111111)), .Names = c("Idindx", "minTime", "maxTime", "duration_minutes", 
"Ship"), class = "data.frame", row.names = c(NA, -6L))

当我尝试在Idindx之后合并它们时出现以下错误:

final_data<- inner_join(data_stops, LatLong)

Joining by: "Idindx"
Error in data.table::setkeyv(x, by$x) : 
  4 arguments passed to .Internal(nchar) which requires 3

这是我试过的:

  • 更新RStudio和我使用的软件包:没有成功
  • 也使用合并(LatLong,data_stops,by =“Idindx”):没有成功
  • 以确保两个表具有相同的格式:as.data.table():no success
  • 确保Idindx对于两个表都是数字:没有成功

最后用

final_data <- full_join(data_stops, LatLong, by="Idindx" )

有效!

我想了解为什么inner_join()无法在这里工作!

谢谢!

1 个答案:

答案 0 :(得分:0)

使用您发布的数据(dput())我使用dplyr::inner_join()加入数据时没有问题:

data_stops <- structure(list(Idindx = c(1, 3, 5, 7, 9, 11), minTime = structure(c(1212268036, 
                                                                                  1212299082, 1212299418, 1212299805, 1212300243, 1212300629), class = c("POSIXct", 
                                                                                                                                                         "POSIXt")), maxTime = structure(c(1212298825, 1212299193, 1212299563, 
                                                                                                                                                                                           1212299924, 1212300293, 1212300664), class = c("POSIXct", "POSIXt"
                                                                                                                                                                                           )), duration_minutes = structure(c(513, 2, 2, 2, 1, 1), units = "mins", class = "difftime"), 
                             Ship = c(111111111, 111111111, 111111111, 111111111, 111111111, 
                                      111111111)), .Names = c("Idindx", "minTime", "maxTime", "duration_minutes", 
                                                              "Ship"), class = "data.frame", row.names = c(NA, -6L))

LatLong <- structure(list(Idindx = c(1, 3, 5, 7, 9, 11), latMean = c(54.831033947613, 
                                                                     54.8310100000107, 54.8309920000003, 54.8310145000011, 54.8310115000001, 
                                                                     54.831043), longMean = c(11.1227872540957, 11.1227459999747, 
                                                                                              11.1227690000004, 11.1227944999961, 11.1228075000002, 11.1228525
                                                                     )), .Names = c("Idindx", "latMean", "longMean"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                          -6L))

require("dplyr")
final_data <- inner_join(data_stops, LatLong, by = "Idindx")
head(final_data)

#   Idindx             minTime             maxTime duration_minutes      Ship
# 1      1 2008-05-31 22:07:16 2008-06-01 06:40:25         513 mins 111111111
# 2      3 2008-06-01 06:44:42 2008-06-01 06:46:33           2 mins 111111111
# 3      5 2008-06-01 06:50:18 2008-06-01 06:52:43           2 mins 111111111
# 4      7 2008-06-01 06:56:45 2008-06-01 06:58:44           2 mins 111111111
# 5      9 2008-06-01 07:04:03 2008-06-01 07:04:53           1 mins 111111111
# 6     11 2008-06-01 07:10:29 2008-06-01 07:11:04           1 mins 111111111
# (Truncated)

在没有指定密钥的情况下使用inner_join()我没有任何问题。我怀疑你的环境中的某些东西已被修改,当使用干净的数据时,它按预期工作。

顺便说一句,我避免在R(或任何编程语言)中使用不同的符号来命名对象:将camel case(LatLong)与使用下划线(data_stops)混合使用。我使用其中一个。我更喜欢下划线,因为我觉得它更容易扫描,但它并不重要。