在sparklyr Spark DataFrame中将列更改为日期

时间:2017-02-01 16:18:30

标签: r spark-dataframe lubridate sparklyr

我正在使用sparklyr并且无法更改列类以及使用dplyr来聚合数据。这是我目前的代码:

.libPaths(c(.libPaths(), '/usr/lib/spark/R/lib'))
Sys.setenv(SPARK_HOME = "/usr/lib/spark")
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

library(sparklyr)
library(dplyr)
library(magrittr)

sc <- sparkR.session(master = "xxxxx")
df <- read.df("path", "csv", header = "true", inferSchema = "true", na.strings = "NA")

df1<-select(df, df$DATE, df$Subject, df$Source, df$Cost, df$Test)

       DATE      Subject               Source Cost     Test
1 11/8/2016 07gjAAAAAAAq    AAAA_MOAAAGRAAAAA    2        2
2 11/8/2016 07gjAAAAAAAq      BBBB_MOBBB4BBB2    7        7
3 11/8/2016 07gjAAAAAAAq BBBB_MOBICCCCCCCCC14    2        2
4 11/8/2016 07gjAAAAAAAq SCCT_MOBIDDDDDDDDD14    1        1
5 11/8/2016 07gjAAAAAAAq    REET_MOBBBBBBBB01    2        1
6 11/8/2016 07gjAAAAAAAq      SCCT_MRRRF4RR22   11       11

基于此的两个问题:

1)如何将DATE列更改为日期类。我过去的做法是:

df1$DATE<-as.Date(df1$DATE,'%m/%d/%Y')

这是错误:

Error in as.Date.default(df1$DATE, "%m/%d/%Y") : 
  do not know how to convert 'df1$DATE' to class “Date”

任何帮助都会很棒,谢谢!

0 个答案:

没有答案