当存在多个变量类型时,使用Spark将文本文件导入Cassandra

时间:2014-10-21 16:35:22

标签: scala cassandra apache-spark cql datastax-enterprise

我正在使用Spark将文本文件中的数据导入CQL表(在DataStax上)。我已成功完成了一个文件,其中所有变量都是字符串。我首先使用CQL创建表,然后在使用Scala的Spark shell中运行:

val file = sc.textFile("file:///home/pr.txt").map(line => line.split("\\|").map(_.toString));
file.map(line => (line(0), line(1))).saveToCassandra("ks", "ks_pr", Seq("proc_c", "proc_d"));

我要导入的其余文件包含多个变量类型。我使用CQL设置表并在那里指定了相应的类型,但是如何在spark中导入文本文件时对它们进行转换?

2 个答案:

答案 0 :(得分:1)

例如,如果proc_c为Int且proc_d为Double,则可以这样做:

file.map{
   line => (line(0), line(1)).
           map({ case (l, r) => (l.toInt, r.toDouble) }).
           saveToCassandra("ks", "ks_pr", Seq("proc_c", "proc_d")
}

答案 1 :(得分:-1)

使用它从txt文件中获取记录并将其存储到cassandra db中:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;

public class App {

  public static void main(String[] args) throws NumberFormatException, IOException {
    String serverIp = " ? ";
    String keyspace = "? ";
    String username=" ?";
    String password=" ? ";

    Cluster cluster = Cluster.builder()
                            .addContactPoints(serverIp)
                            .withCredentials(username.trim(), password.trim())
                            .build();
    Session session = cluster.connect(keyspace);
    File file = new File("E:\\new workspace\\Casandracheck3\\text1.txt");

    BufferedReader br = new BufferedReader(new FileReader(file)); 

    String st; 
    String mc_name=null;
    String mobileno=null;
    String customer_id=null;
    String date_time=null;
    Integer cust_id=0;
    while ((st = br.readLine()) != null) {

      StringTokenizer tokenizer = new StringTokenizer(st, ","); 

      mc_name = tokenizer.nextToken();
      mobileno = tokenizer.nextToken();
      customer_id=tokenizer.nextToken();
      date_time=tokenizer.nextToken();
      cust_id=Integer.parseInt(customer_id);

      System.out.println("USERNAME=" + mc_name + "&MOBILENO=" + mobileno + "&CUSTOMER_ID=" + cust_id + "&DATE_TIME=" + date_time);
      System.out.println("checking before queryy..............................");

      String cqlStatement = "insert  into table_name(id,mc_name,mc_mobileno,customer_id,mc_imported_date)"
            + "values(now(),'" + mc_name + "','" + mobileno + "'," + customer_id+ ",'"+date_time+"')";

      for (Row row : session.execute(cqlStatement)) {
        System.out.println(row.toString());
      }
    }
  }
}