我有很多yaml文件,我想用sparklyr读取它们。我似乎无法弄清楚如何使用任何spark_read_ *函数执行此操作。是否可以,或者是否需要将文件转换为其他格式?
我添加了一个最小的示例来说明数据。
library(sparklyr)
## Spark connection
sc <- spark_connect(master = "local", version = "2.1.0")
## Create data
data_dir <- tempdir()
tbl_yaml <- data.frame(x1 = 1:3, x2 = 1:3, x3 = 4L)
file_path <- sprintf("%s/tbl_yaml.yml", data_dir)
## Write data to disk
yaml::write_yaml(tbl_yaml, file_path, column.major = FALSE)
如果要执行以下操作-请注意,这显然将不起作用
## Read data
tbl <- spark_read_csv(
sc,
name = "test",
path = file_path
)
其结果如下所示
# Source: table<test> [?? x 1]
# Database: spark_connection
`_x1_1`
<chr>
1 " x2: 1"
2 " x3: 4"
3 - x1: 2
4 " x2: 2"
5 " x3: 4"
6 - x1: 3
7 " x2: 3"
8 " x3: 4"
但所需的输出等于
# Source: table<test> [?? x 3]
# Database: spark_connection
x1 x2 x3
<int> <int> <int>
1 1 1 4
2 2 2 4
3 3 3 4
谢谢!