将StringType转换为ArrayType

时间:2016-10-12 08:32:40

标签: scala apache-spark

是否可以将StringType列强制转换为spark数据帧中的ArrayType列?

public partial class WordEditorApp : Form { //Creating an arraylist for the combobox. ArrayList al = new ArrayList(); public WordEditorApp() { InitializeComponent(); //Initializing the radio buttons. upperCase.Checked = false; lowerCase.Checked = false; //Adding items to the arraylist. al.Add("Grams"); al.Add("Aristrocrats"); al.Add("Sophisticated"); al.Add("Corruption"); al.Add("Interrupt"); al.Add("Operation"); al.Add("Decision"); al.Add("Bantam"); al.Add("Brochure"); al.Add("Hydraulics"); al.Add("Properties"); //for loop to add items to the arraylist. for (int i = 0; i < al.Count; i++) { comboBox1.Items.Add(al[i].ToString()); } } private void comboBox1_SelectedIndexChanged(object sender, EventArgs e) { try { //Selected word from the combo box is appended into the textbox. richTextBox1.AppendText(ArrayList.SelectedItem.ToString()); } catch (Exception) { } } 给出了这个

  

架构 - &gt;
  a:string(nullable = true)

现在我想将其转换为

  

a:array(nullable = true)

2 个答案:

答案 0 :(得分:3)

elisiah所述,您必须拆分字符串。您可以使用UDF:

    df.printSchema

    import org.apache.spark.sql.functions._

    val toArray = udf[Array[String], String]( _.split(" "))
    val featureDf = df
      .withColumn("a", toArray(df("a")))  

    featureDF.printSchema

提供输出:

root  
 |-- a: string (nullable = true)

root
 |-- a: array (nullable = true)
 |    |-- element: string (containsNull = true)

答案 1 :(得分:0)

另一种简单地将column包装在functions.array中的选项。

df.withColumn("a", functions.array(col("a")))