在dataframe spark 1.6 java中将字符串转换为日期或int

时间:2018-07-10 11:45:51

标签: java apache-spark dataframe casting group-by

如何在dataframe spark 1.6中将字符串转换为日期或int。 我想将1/10/100年的数据分组, dataFrame有列“ dt”-字符串 为Java编码 例如:

enter image description here

我该怎么做?

1 个答案:

答案 0 :(得分:0)

DataFrame dataFrame = sqlContext.read()
            .format("com.databricks.spark.csv")
            .option("inferSchema", "true")
            .option("header", "true")
            .load(filePath.getAbsolutePath());


    DataFrame withDate = dataFrame.withColumn("date",col("dt").cast("date"));

    withDate.withColumn(
            "year",
            year(col("date"))
    ).withColumn(
            "month",
            month(col("date"))
    ).withColumn(
            "day",
            dayofmonth(col("date"))
    ).show();

DataFrame inputFrame = sqlContext.read()
            .format("com.databricks.spark.csv")
            .option("inferSchema", "true")
            .option("header", "true")
            .load(filePath.getAbsolutePath());


    DataFrame with_data_frame = inputFrame
            .withColumn(
                    "date", col("dt").cast("date"))
            .withColumn(
                    "year",year(col("date"))
            );
    with_data_frame.show();

完美)