我看起来像这样的数据:
data <- c("24-March-2017 text1 874874455221112 Text text text10",
"25-March-2017 text2 54654656TEXT Text text 11",
"24-March-2017 text3 874874455221112 Text text text 12",
"25-March-2017 text4 54654656TEXT Text text 13",
"26-March-2017 text3 54654TEXT Text text text 14",
"27-March-2017 text5 6546TEXT Text text text 15",
"28-March-2017 text6 546476876586TExt Text text text 16",
"29-March-2017 text7 23453453TEXT Text text 17")
我想根据每列之间的空格将此数据转换为结构化格式。前三行看起来与我希望数据看起来完全一样。最终结果需要看起来像:
基本上:
答案 0 :(得分:3)
do.call('rbind', lapply( df, function( x ) { # loop through vector df
x <- strsplit( x, "\ ")[[1]] # split string by spaces
x <- x[which( unlist( lapply(x, nchar) ) > 0 )] # remove zero length strings
x <- c(x[1:3], paste( x[4:length(x)], collapse = " ") ) # collapse all elements from 4 to end
return( x) # return formatted vector
}))
# [,1] [,2] [,3] [,4]
# [1,] "24-March-2017" "text1" "874874455221112" "Text text text10"
# [2,] "25-March-2017" "text2" "54654656TEXT" "Text text 11"
# [3,] "24-March-2017" "text3" "874874455221112" "Text text text 12"
# [4,] "25-March-2017" "text4" "54654656TEXT" "Text text 13"
# [5,] "26-March-2017" "text3" "54654TEXT" "Text text text 14"
# [6,] "27-March-2017" "text5" "6546TEXT" "Text text text 15"
# [7,] "28-March-2017" "text6" "546476876586TExt" "Text text text 16"
# [8,] "29-March-2017" "text7" "23453453TEXT" "Text text 17"
基于@thelatemail评论
df <- read.table(text=df,fill=TRUE,header=FALSE)
df[, 4] <- apply( df[, 4:ncol(df)], 1, function( x ) {
paste( x[ ! is.na( x ) ], collapse = ' ') } )
df <- df[, 1:4]
df
# V1 V2 V3 V4
# 1 24-March-2017 text1 874874455221112 Text text text10
# 2 25-March-2017 text2 54654656TEXT Text text 11
# 3 24-March-2017 text3 874874455221112 Text text text 12
# 4 25-March-2017 text4 54654656TEXT Text text 13
# 5 26-March-2017 text3 54654TEXT Text text text 14
# 6 27-March-2017 text5 6546TEXT Text text text 15
# 7 28-March-2017 text6 546476876586TExt Text text text 16
# 8 29-March-2017 text7 23453453TEXT Text text 17
数据:强>
df <- c("24-March-2017 text1 874874455221112 Text text text10",
"25-March-2017 text2 54654656TEXT Text text 11",
"24-March-2017 text3 874874455221112 Text text text 12",
"25-March-2017 text4 54654656TEXT Text text 13",
"26-March-2017 text3 54654TEXT Text text text 14",
"27-March-2017 text5 6546TEXT Text text text 15",
"28-March-2017 text6 546476876586TExt Text text text 16",
"29-March-2017 text7 23453453TEXT Text text 17")
答案 1 :(得分:3)
这是基于给定的数据,并假定:
它将匹配的子字符串rbind
拉出到矩阵中,删除全局匹配,转换为data.frame
然后通过sprintf
以获得固定宽度输出。
data %>%
regmatches(regexec("^(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(.*?)$", .)) %>%
do.call("rbind", .) %>%
.[, -1] %>%
as.data.frame(stringsAsFactors = FALSE) %>%
c(list("%-20s%-30s%-30s%s"), .) %>%
do.call("sprintf", .)
# [1] "24-March-2017 text1 874874455221112 Text text text10"
# [2] "25-March-2017 text2 54654656TEXT Text text 11"
# [3] "24-March-2017 text3 874874455221112 Text text text 12"
# [4] "25-March-2017 text4 54654656TEXT Text text 13"
# [5] "26-March-2017 text3 54654TEXT Text text text 14"
# [6] "27-March-2017 text5 6546TEXT Text text text 15"
# [7] "28-March-2017 text6 546476876586TExt Text text text 16"
# [8] "29-March-2017 text7 23453453TEXT Text text 17"