我有两个h2o帧,我想基于两个存在的相同列加入它们,我正在使用Java API并从spark数据帧中获取h2o帧。
TypeError: There is no interface object registered that supports this IID
我可以使用spark数据帧来连接数据,因为我的数据非常大而且RDD可以在这里工作,所以我需要使用h2o帧作为内存中的对象。
答案 0 :(得分:0)
看看h2o.merge()
命令。
# Currently, this function only supports `all.x = TRUE`. All other permutations will fail.
library(h2o)
h2o.init()
# Create two simple, two-column R data frames by inputting values, ensuring that both have a common column (in this case, "fruit").
left <- data.frame(fruit = c('apple','orange','banana','lemon','strawberry','blueberry'),
color = c('red','orange','yellow','yellow','red','blue'))
right <- data.frame(fruit = c('apple','orange','banana','lemon','strawberry','watermelon'),
citrus = c(FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))
# Create the H2O data frames from the inputted data.
l.hex <- as.h2o(left)
print(l.hex)
fruit color
1 apple red
2 orange orange
3 banana yellow
4 lemon yellow
5 strawberry red
6 blueberry blue
[6 rows x 2 columns]
r.hex <- as.h2o(right)
print(r.hex)
fruit citrus
1 apple FALSE
2 orange TRUE
3 banana FALSE
4 lemon TRUE
5 strawberry FALSE
6 watermelon FALSE
[6 rows x 2 columns]
# Merge the data frames. The result is a single dataset with three columns.
left.hex <- h2o.merge(l.hex, r.hex, all.x = TRUE)
print(left.hex)
fruit color citrus
1 blueberry blue <NA>
2 apple red FALSE
3 banana yellow FALSE
4 lemon yellow TRUE
5 orange orange TRUE
6 strawberry red FALSE
[6 rows x 3 columns]