我的数据表中有一列,我需要分成5列。
典型值是1A02B1,我需要将其分成1,A,02,B,1列。
答案 0 :(得分:0)
我们可以使用正则表达式创建分隔符,然后使用read.csv
中的base R
v1 <- gsub("(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])", ",", df1$Col1, perl = TRUE)
read.csv(text = v1, header = FALSE)
# V1 V2 V3 V4 V5
#1 1 A 2 B 1
#2 1 B 3 C 1
如果我们需要字符串格式的第三列,请指定colClasses
read.csv(text = v1, header = FALSE, colClasses = c('integer',
'character', 'character', 'character', 'integer'),
stringsAsFactors = FALSE)
# V1 V2 V3 V4 V5
#1 1 A 02 B 1
#2 1 B 03 C 1
df1 <- data.frame(Col1 = c("1A02B1", "1B03C1"), stringsAsFactors = FALSE)
答案 1 :(得分:0)
1)假设“注释”末尾所示的输入数据帧通过一个与数字,非数字,数字,非数字和数字匹配的模式来读取。将组捕获到字段中:
library(gsubfn)
pat <- "(\\d+)(\\D+)(\\d+)(\\D+)(\\d+)"
read.pattern(text = DF$x, pattern = pat, colClasses = "character")
给予:
V1 V2 V3 V4 V5
1 1 A 02 B 1
您可能需要根据需要省略或更改colClasses
参数。
2)或者,strsplit
可用于创建此矩阵:
do.call("rbind", strsplit(DF$x, "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)", perl = TRUE))
## [,1] [,2] [,3] [,4] [,5]
## [1,] "1" "A" "02" "B" "1"
DF <- data.frame(x = "1A02B1", stringsAsFactors = FALSE)