如何将substr()
函数用于sparkR
+----------+----------------+-----------+
| cust_id| tran_datetime |Total_trans|
+----------+----------------+-----------+
|CQ98901297|2015-06-06 09:00| 1|
|CQ98901297|2015-05-01 09:25| 1|
|CQ98901297|2015-05-02 10:45| 1|
|CQ98901297|2015-05-03 11:01| 1|
我需要在tran_datetime
列
答案 0 :(得分:0)
#use substr(df, start position, End position) in the select() function
df_new <- select(df, df$cust_id , substr(df$tran_datetime, 1, 10), df$Total_trans)
#In the df_new you get a random column name for the column where you used substr(), so use rename() to get the desired column name
df_new <- rename(df_new, date = df_new[[2]])
showDF(df_new)
+----------+----------+-----------+
| cust_id| date |Total_trans|
+----------+----------+-----------+
|CQ98901297|2015-06-06| 1|
|CQ98901297|2015-05-01| 1|
|CQ98901297|2015-05-02| 1|
|CQ98901297|2015-05-03| 1|
答案 1 :(得分:-1)
我想最好的解决方案是应用strsplit。
x <- data.frame(lin=c('+----------+----------------+-----------+',
'| cust_id| tran_datetime |Total_trans|',
'+----------+----------------+-----------+',
'|CQ98901297|2015-06-06 09:00| 1|',
'|CQ98901297|2015-05-01 09:25| 1|',
'|CQ98901297|2015-05-02 10:45| 1|'),
id = 1:6,
stringsAsFactors = F)
#removing the lines that starts with +
x <- x[substr(x$lin,1,1)!="+",]
# spliting the line into columns pipe-separed
y <- strsplit(x$lin,split = "\\|")
#removing whitespaces after split
library(stringr)
y <- lapply(y, function(x){str_trim(x,'both')})
# [,-1] because the first column is empty
y <- do.call(rbind,y)[,-1]
colnames(y) <- y[1,]
y <- data.frame(y[-1,],stringsAsFactors = F)
y