我正在使用avro格式的kafka数据库,我想将其流式传输以使用sparklyr进行数据分析,以用于闪亮的Web应用程序。
使用sparklyr,我可以将spark结构化的流连接到kafka db,但是我不知道如何反序列化avro。我们使用架构注册表进行架构管理。而且我也无法解决如何始终从头开始读取数据库的问题。
library(sparklyr)
config <- spark_config()
config$sparklyr.shell.packages <-
"org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0"
sc <- spark_connect(master = "local", config = config)
read_options <- list(
kafka.bootstrap.servers = "kafka:9092",
subscribe = "testopic",
key.serializer = "io.confluent.kafka.serializers.KafkaAvroSerializer",
value.serializer = "io.confluent.kafka.serializers.KafkaAvroSerializer",
schema.registry.url = "http://kafka:8081"
)
stream <-
stream_read_kafka(sc, options = read_options) %>% stream_write_memory('kafka_1')
会话信息:
> devtools::session_info()
- Session info --------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.3 (2019-03-11)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_Australia.1252
ctype English_Australia.1252
tz Australia/Sydney
date 2019-03-26
- Packages ------------------------------------------------------------------------------------------------------------
package * version date lib source
askpass 1.1 2019-01-13 [1] CRAN (R 3.5.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.5.0)
callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
config 0.3 2018-03-27 [1] CRAN (R 3.5.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)
DBI * 1.0.0 2018-05-02 [1] CRAN (R 3.5.2)
dbplyr 1.3.0 2019-01-09 [1] CRAN (R 3.5.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.3)
devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.3)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)
dplyr 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.3)
ellipsis 0.1.0 2019-02-19 [1] CRAN (R 3.5.3)
forge 0.2.0 2019-02-26 [1] CRAN (R 3.5.3)
fs 1.2.7 2019-03-19 [1] CRAN (R 3.5.3)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.2)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1)
htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.5.1)
httr 1.4.0 2018-12-11 [1] CRAN (R 3.5.2)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.3)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.3)
openssl 1.3 2019-03-22 [1] CRAN (R 3.5.3)
pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.3)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.3)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.3)
processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.3)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.3)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.3)
r2d3 0.2.3 2018-12-18 [1] CRAN (R 3.5.2)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.3)
rappdirs 0.3.1 2016-03-28 [1] CRAN (R 3.5.2)
Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.3)
remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.3)
rlang 0.3.2 2019-03-21 [1] CRAN (R 3.5.3)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.3)
sparklyr * 1.0.0 2019-02-25 [1] CRAN (R 3.5.3)
tibble 2.1.1 2019-03-16 [1] CRAN (R 3.5.3)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.2)
usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.3)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)
[1] C:/Users/robin/Documents/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.3/library