在DSX Notebook

时间:2017-07-08 15:48:13

标签: data-science-experience spark-cloudant

我正在尝试关注https://developer.ibm.com/clouddataservices/docs/ibm-data-science-experience/docs/load-and-filter-cloudant-data-with-spark/以使用Spark加载云端数据。我有一个带有Spark 2.1的Scala 2.11(也适用于Spark 2.0)笔记本中包含以下代码:

// @hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
  "username"->"<redacted>",
  "password"->"""<redacted>""",
  "host"->"<redacted>",
  "port"->"443",
  "url"->"<redacted>"
)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val cloudantdata = sqlContext.read.format("com.cloudant.spark").
option("cloudant.host", credentials("host")).
option("cloudant.username", credentials("username")).
option("cloudant.password", credentials("password")).
load("crimes")

尝试执行该单元格仅以

结尾
  

名称:java.lang.ClassNotFoundException   消息:无法找到数据源:com.cloudant.spark。请在http://spark.apache.org/third-party-projects.html找到套餐   StackTrace:at org.apache.spark.sql.execution.datasources.DataSource $ .lookupDataSource(DataSource.scala:569)     在org.apache.spark.sql.execution.datasources.DataSource.providingClass $ lzycompute(DataSource.scala:86)     在org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)     在org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)     在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)     在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)     ...... 42岁   引起:java.lang.ClassNotFoundException:com.cloudant.spark.DefaultSource     在scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)     at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844)     at java.lang.ClassLoader.loadClass(ClassLoader.java:823)     at java.lang.ClassLoader.loadClass(ClassLoader.java:803)     在org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 25 $$ anonfun $ apply $ 13.apply(DataSource.scala:554)     在org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 25 $$ anonfun $ apply $ 13.apply(DataSource.scala:554)     在scala.util.Try $ .apply(Try.scala:192)     在org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 25.apply(DataSource.scala:554)     在org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 25.apply(DataSource.scala:554)     在scala.util.Try.orElse(Try.scala:84)     在org.apache.spark.sql.execution.datasources.DataSource $ .lookupDataSource(DataSource.scala:554)

如何通过此错误并连接到我的Cloudant数据库?

1 个答案:

答案 0 :(得分:2)

必定存在导致云端驱动程序丢失的问题,这通常默认存在于DSX Notebook中。 请更改为python 2.0并激活2.1内核 并运行cloudant连接器的这一次安装(每个spark服务),以便它可用于所有spark 2.0+内核。

!pip install --upgrade pixiedust

import pixiedust

pixiedust.installPackage("cloudant-labs:spark-cloudant:2.0.0-s_2.11")

重启内核一次。

然后将内核更改为scala内核,然后运行cloudant连接代码。

谢谢, 查尔斯。