如何使用字符串数组在spark数据帧中将列名设置为toDF()函数?

时间:2016-06-23 13:01:06

标签: scala apache-spark

例如,

val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns)

如何使用字符串数组设置列名? 是否可以在toDF()中提及数据类型?

3 个答案:

答案 0 :(得分:8)

toDF()接受类型为String重复参数,因此您可以使用_*类型注释来传递序列:

val df=sc.parallelize(Seq(
  (1,"example1", Seq(0,2,5)),
  (2,"example2", Seq(1,20,5)))).toDF(columns: _*)

有关重复参数的更多信息,请参阅Scala Language Specification中的第4.6.2节。

答案 1 :(得分:4)

<?php

$filenameArray = array();
$summit = array(); 
$royce = array(); 
$soundview = array(); 
$merrit = array(); 
$hillcroft = array(); 


$all = array(); 


$handle = opendir(dirname(realpath(__FILE__)).'/assets/images/summit/slider/');
    while($file = readdir($handle)){
        if($file !== '.' && $file !== '..'){
        array_push($filenameArray, "assets/images/summit/slider/$file");
        array_push($summit, "assets/images/summit/slider/$file");

        }
    }
array_push($all, $summit);


$handle = opendir(dirname(realpath(__FILE__)).'/assets/images/royce/slider/');
    while($file = readdir($handle)){
        if($file !== '.' && $file !== '..'){
        array_push($filenameArray, "assets/images/royce/slider/$file");
        array_push($royce, "assets/images/royce/slider/$file");

        }
    }
array_push($all, $royce);

$handle = opendir(dirname(realpath(__FILE__)).'/assets/images/soundview/slider/');
    while($file = readdir($handle)){
        if($file !== '.' && $file !== '..'){
        array_push($filenameArray, "assets/images/soundview/slider/$file");
            array_push($soundview, "assets/images/soundview/slider/$file");

        }
    }
array_push($all, $soundview);

$handle = opendir(dirname(realpath(__FILE__)).'/assets/images/merrit-station/slider/');
    while($file = readdir($handle)){
        if($file !== '.' && $file !== '..'){
            array_push($filenameArray, "assets/images/merrit-station/slider/$file");
            array_push($merrit, "assets/images/merrit-station/slider/$file");

    }
    }
array_push($all, $merrit);

$handle = opendir(dirname(realpath(__FILE__)).'/assets/images/hillcroft-danbury/slider/');
    while($file = readdir($handle)){
        if($file !== '.' && $file !== '..'){
        array_push($filenameArray, "assets/images/hillcroft-danbury/slider/$file");
                array_push($hillcroft, "assets/images/hillcroft-danbury/slider/$file");

        }
    }
array_push($all, $hillcroft);

echo json_encode($all);


?>       

toDF()采用逗号分隔的字符串

答案 2 :(得分:4)

toDF()Spark documentation中定义为:

def toDF(colNames: String*): DataFrame

因此您需要将数组转换为 varargs ,如同here所述。这意味着您需要执行以下操作:

val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns: _*)

(在columns中添加:_ *到toDF