更改数据框spark scala中的列值

时间:2017-05-17 07:34:28

标签: scala apache-spark-sql spark-dataframe

这就是我的数据框目前的样子

Date

我正在尝试将此字符串值重新格式化为yyyy-mm-dd hh:mm:ss.fff并将其保留为字符串而不是日期类型或时间戳。

如何使用withColumn方法执行此操作?

3 个答案:

答案 0 :(得分:1)

以下是使用UDFwithcolumn的解决方案,我假设您在Dataframe

中有一个字符串日期字段
//Create dfList dataframe
  val dfList = spark.sparkContext
    .parallelize(Seq("19931001","19930404", "19930603", "19930805")).toDF("DATE")


  dfList.withColumn("DATE", dateToTimeStamp($"DATE")).show()

  val dateToTimeStamp = udf((date: String) => {
    val stringDate = date.substring(0,4)+"/"+date.substring(4,6)+"/"+date.substring(6,8)
    val format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
    format.format(new SimpleDateFormat("yyy/MM/dd").parse(stringDate))
  })

答案 1 :(得分:0)

class ClassA : EarlyDependency, IComparable<EarlyDependency>
{
     EarlyDependency field;
     property EarlyDependency PropertyA { get; set; }

     int initialized = new EarlyDependency().Calculate();
     int initializedB = LateDependency.LiteralConstant;

     static ClassA
     {
         EarlyDependency localInStaticConstructor;
     }

     public ClassA()
     {
         EarlyDependency localInInstanceConstructor;

         if (new Random().NextDouble() < .000001) {
             try {
                 // you can't catch inside the function that fails to compile
                 // because code inside that function can't ever run
                 UsedByConstructor();
             }
             catch (TypeLoadException)
             {
             }
         }
     }

     public EarlyDependency MethodWithReturnType();
     public static EarlyDependency StaticMethodWithReturnType();

     public void MethodWithParameter(EarlyDependency parameter);

     public void UseIt()
     {
         LateDependency localInNonSpecialMethod;
     }

     public void Safe()
     {
         try {
             // you can't catch inside the function that fails to compile
             // because code inside that function can't ever run
             UseIt();
         }
         catch (TypeLoadException)
         {
         }
     }

     public static void UseItSomeMore()
     {
         LateDependency localInStaticMethod;
     }

     private void UsedByConstructor()
     {
         LateDependency localInMethodNamedInConstructor;
     }
}

这应该有效。 另一个通知是withClumn("date", from_unixtime(unix_timestamp($"date", "yyyyMMdd"), "yyyy-MM-dd hh:mm:ss.fff") as "date") 给出了分钟,mm给出了几个月,希望这对你有所帮助。

答案 2 :(得分:-1)

首先,我创建了这个DF:

val df = sc.parallelize(Seq("19931001","19930404","19930603","19930805")).toDF("DATE")

对于日期管理,我们将使用joda time Library(不要忘记加入joda-time.jar文件)

import org.joda.time.format.DateTimeFormat
import org.joda.time.format.DateTimeFormatter 

def func(s:String):String={ 
val dateFormat = DateTimeFormat.forPattern("yyyymmdd");
val resultDate = dateFormat.parseDateTime(s);
return resultDate.toString();
}

最后,将该函数应用于dataframe:

val temp = df.map(l => func(l.get(0).toString()))
val df2 = temp.toDF("DATE")
df2.show()

这个答案仍然需要一些工作,我自己是新手,但我认为它正在完成工作!