Question

以下是一些有用的链接：https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/a4JND9jiNCY

如何在上面的谷歌讨论中使用参考代码？我无法理解scala代码，请举例说明一下？

Answer 1

通常，您可以创建一个元组，然后在其上调用SaveToCassandra。像这样：

val myRdd = sc.cassandraTable("mykeypace", "mytable")
val myTransformedRdd = myRdd.map {
  ( myRdd.getString("field1"), myRdd.GetString("field3") )
}
myTransformedRdd.saveToCassandra("mykeyspace", "someothertable", SomeColumns("field1", field3"))

myRdd的类型是RDD [CassandraRow]。 myTransformedRdd的类型是RDD [（string，string）]。

引擎盖下，Scala实际上使用的是Tuple2 [string，string]。 Scala支持一直到Tuple22。较新版本的Scala支持更多。

如果您的结构具有超过22个字段，则可以构建其他类型的RDD。例如，您可以构造一个RDD [CassandraRow]。

如果我使用上面的代码并将其更改为使用CassandraRow对象而不是元组，它可能如下所示：

val myRdd = sc.cassandraTable("mykeypace", "mytable")

//build an array of the column names which we need later to make a CassandraRow object
val allColumnNames = Array[String](
  "field1",
  "field2"
)

//loop through the column names and create ColumnName objects from them
//we will need this later when we call SomeColumns()
val columnRefs = for(item <- allColumnNames) yield {
  new ColumnName(item)
}

val myTransformedRdd = myRdd.map {

  //create an Indexed Sequence with all of the values
  //we will need this to create the CassandraRow object
  val allValues = IndexedSeq[AnyRef](myRdd.GetString("field1"), myRdd.GetString("field3"))


  new CassandraRow(allColumnNames, allValues)
}

//the _* syntax tells Scala to take our columnRefs array and pass them into SomeColumns in the correct way
myTransformedRdd.saveToCassandra("mykeyspace", "someothertable", SomeColumns(columnRefs:_*)

这两段代码完成了同样的事情，但是第二个版本允许你在你的结构中传递超过22个项目

当表有超过22个字段时如何使用savetocassandra？

1 个答案: