我正在使用Spark将文本文件中的数据导入CQL表(在DataStax上)。我已成功完成了一个文件,其中所有变量都是字符串。我首先使用CQL创建表,然后在使用Scala的Spark shell中运行:
val file = sc.textFile("file:///home/pr.txt").map(line => line.split("\\|").map(_.toString));
file.map(line => (line(0), line(1))).saveToCassandra("ks", "ks_pr", Seq("proc_c", "proc_d"));
我要导入的其余文件包含多个变量类型。我使用CQL设置表并在那里指定了相应的类型,但是如何在spark中导入文本文件时对它们进行转换?
答案 0 :(得分:1)
例如,如果proc_c为Int且proc_d为Double,则可以这样做:
file.map{
line => (line(0), line(1)).
map({ case (l, r) => (l.toInt, r.toDouble) }).
saveToCassandra("ks", "ks_pr", Seq("proc_c", "proc_d")
}
答案 1 :(得分:-1)
使用它从txt文件中获取记录并将其存储到cassandra db中:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
public class App {
public static void main(String[] args) throws NumberFormatException, IOException {
String serverIp = " ? ";
String keyspace = "? ";
String username=" ?";
String password=" ? ";
Cluster cluster = Cluster.builder()
.addContactPoints(serverIp)
.withCredentials(username.trim(), password.trim())
.build();
Session session = cluster.connect(keyspace);
File file = new File("E:\\new workspace\\Casandracheck3\\text1.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
String mc_name=null;
String mobileno=null;
String customer_id=null;
String date_time=null;
Integer cust_id=0;
while ((st = br.readLine()) != null) {
StringTokenizer tokenizer = new StringTokenizer(st, ",");
mc_name = tokenizer.nextToken();
mobileno = tokenizer.nextToken();
customer_id=tokenizer.nextToken();
date_time=tokenizer.nextToken();
cust_id=Integer.parseInt(customer_id);
System.out.println("USERNAME=" + mc_name + "&MOBILENO=" + mobileno + "&CUSTOMER_ID=" + cust_id + "&DATE_TIME=" + date_time);
System.out.println("checking before queryy..............................");
String cqlStatement = "insert into table_name(id,mc_name,mc_mobileno,customer_id,mc_imported_date)"
+ "values(now(),'" + mc_name + "','" + mobileno + "'," + customer_id+ ",'"+date_time+"')";
for (Row row : session.execute(cqlStatement)) {
System.out.println(row.toString());
}
}
}
}