嗨,我对spark和scala很新,在这里我面临一些将数据保存到cassandra中的问题以下是我的方案
1)我从我的java类到scala类得到用户定义对象的列表(比如说包含firstName,lastName等的用户对象..),直到这里它很好我可以访问用户对象并能够打印其内容
2)现在我想使用spark上下文将那个usersList保存到cassandra表中,我已经经历了很多例子,但我看到用 caseClass 创建 Seq 的每一个硬编码的值然后保存到cassandra,我已经尝试过并且正常工作,如下所示
import scala.collection.JavaConversions._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._
import java.util.ArrayList
object SparkCassandra extends App {
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("SparkCassandra")
//set Cassandra host address as your local address
.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val usersList = Test.getUsers
usersList.foreach(x => print(x.getFirstName))
val collection = sc.parallelize(Seq(userTable("testName1"), userTable("testName1")))
collection.saveToCassandra("demo", "user", SomeColumns("name"))
sc.stop()
}
case class userTable(name: String)
但是我的要求是使用来自usersList的动态值而不是硬连接值,或任何其他方式来实现这一点。
答案 0 :(得分:0)
如果创建RDD
个CassandraRow
个对象,则可以直接保存结果,而无需指定列或案例类。此外,CassandraRow
具有非常方便的fromMap
函数,因此您可以将行定义为Map
个对象,转换它们并保存结果。
示例:
val myData = sc.parallelize(
Seq(
Map("name" -> "spiffman", "address" -> "127.0.0.1"),
Map("name" -> "Shabarinath", "address" -> "127.0.0.1")
)
)
val cassandraRowData = myData.map(rowMap => CassandraRow.fromMap(rowMap))
cassandraRowData.saveToCassandra("keyspace", "table")
答案 1 :(得分:0)
最后,我得到了我的测试要求的解决方案并正常工作如下:
我的Scala代码:
import scala.collection.JavaConversions.asScalaBuffer
import scala.reflect.runtime.universe
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import com.datastax.spark.connector.SomeColumns
import com.datastax.spark.connector.toNamedColumnRef
import com.datastax.spark.connector.toRDDFunctions
object JavaListInsert {
def randomStores(sc: SparkContext, users: List[User]): RDD[(String, String, String)] = {
sc.parallelize(users).map { x =>
val fistName = x.getFirstName
val lastName = x.getLastName
val city = x.getCity
(fistName, lastName, city)
}
}
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("cassandraInsert")
val sc = new SparkContext(conf)
val usersList = Test.getUsers.toList
randomStores(sc, usersList).
saveToCassandra("test", "stores", SomeColumns("first_name", "last_name", "city"))
sc.stop
}
}
Java Pojo对象:
import java.io.Serializable;
public class User implements Serializable{
private static final long serialVersionUID = -187292417543564400L;
private String firstName;
private String lastName;
private String city;
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
public String getCity() {
return city;
}
public void setCity(String city) {
this.city = city;
}
}
返回用户列表的Java类:
import java.util.ArrayList;
import java.util.List;
public class Test {
public static List<User> getUsers() {
ArrayList<User> usersList = new ArrayList<User>();
for(int i=1;i<=100;i++) {
User user = new User();
user.setFirstName("firstName_+"+i);
user.setLastName("lastName_+"+i);
user.setCity("city_+"+i);
usersList.add(user);
}
return usersList;
}
}