在Scala中展平DataFrame,其中包含不同的DataType

时间:2017-07-26 12:54:49

标签: scala apache-spark spark-dataframe flatten

如您所知,DataFrame可以包含复杂类型的字段,如结构(StructType)或数组(ArrayType)。在我的例子中,您可能需要使用简单类型字段(String,Integer ...)将所有DataFrame数据映射到Hive表。 我很长一段时间一直在努力解决这个问题,我终于找到了一个我想分享的解决方案。 此外,我确信它可以改进,所以请随意回复你自己的建议。

它基于this thread,但也适用于ArrayType元素,而不仅仅是StructType元素。 它是一个尾递归函数,它接收一个DataFrame,并将其返回展平。

private void textBox1_TextChanged(object sender, EventArgs e)
{
    if(!String.IsNullOrEmpty(textBox1.Text))
    {
        PopulateCombo(1);
    }
}

private void textBox2_TextChanged(object sender, EventArgs e)
{
    if(!String.IsNullOrEmpty(textBox2.Text))
    {
        PopulateCombo(2);
    }
}

private void textBox3_TextChanged(object sender, EventArgs e)
{
    if(!String.IsNullOrEmpty(textBox3.Text))
    {
        PopulateCombo(3);
    }
}

private void PopulateCombo(int textBoxID)
    {
        //With this you will get how many textBoxes have value
        int filledTextboxes = 0;
        if(!String.IsNullOrEmpty(textBox1.Text))
        {
            filledTextboxes++;
        }
        if (!String.IsNullOrEmpty(textBox2.Text))
        {
            filledTextboxes++;
        }
        if (!String.IsNullOrEmpty(textBox3.Text))
        {
            filledTextboxes++;
        }

        //With this you will run one code if only one textbox has value and other if more than one has value
        if(filledTextboxes == 1)
        {
            switch(textBoxID)
            {
                case 1:
                    comboBox1.Items.Clear();
                    comboBox1.Items.Add("TextBox1");
                    break;
                case 2:
                    comboBox1.Items.Clear();
                    comboBox1.Items.Add("TextBox2");
                    break;
                case 3:
                    comboBox1.Items.Clear();
                    comboBox1.Items.Add("TextBox3");
                    break;
            }
        }
        else
        {
            comboBox1.Items.Clear();
            MessageBox.Show(String.Format("All items cleared because there are {0} boxes with value", filledTextboxes));
        }
    }

1 个答案:

答案 0 :(得分:0)

val df = Seq((“ 1”,(2,(3,4)),Seq(1,2)))。toDF()

df.printSchema

root
 |-- _1: string (nullable = true)
 |-- _2: struct (nullable = true)
 |    |-- _1: integer (nullable = false)
 |    |-- _2: struct (nullable = true)
 |    |    |-- _1: integer (nullable = false)
 |    |    |-- _2: integer (nullable = false)
 |-- _3: array (nullable = true)
 |    |-- element: integer (containsNull = false)


def flattenSchema(schema: StructType, fieldName: String = null) : Array[Column] = {
   schema.fields.flatMap(f => {
     val cols = if (fieldName == null) f.name else (fieldName + "." + f.name)
     f.dataType match {
       case structType: StructType => fattenSchema(structType, cols)
       case arrayType: ArrayType => Array(explode(col(cols)))
       case _ => Array(col(cols))
     }
   })
 }

df.select(flattenSchema(df.schema):_ *)。printSchema

root
 |-- _1: string (nullable = true)
 |-- _1: integer (nullable = true)
 |-- _1: integer (nullable = true)
 |-- _2: integer (nullable = true)
 |-- col: integer (nullable = false)