我有以下字符数据,
v1 <- c("1321-56, 21-, 15-, 1701-13,", "1305-25, 2101-03, 1501-02, 1711-55,", "1309-18, 21-, 1501-04, 1701-15,")
data <- data.frame(v1)
> data
v1
1 1321-56, 21-, 15-, 1701-13,
2 1305-25, 2101-03, 1501-02, 1711-55,
3 1309-18, 21-, 1501-04, 1701-15,
用逗号分隔,字符行的每一部分都分为三部分。字符数应分别为2、5和6。例如,
1321-56
应该分为三个向量,例如13
(2个字符),00021
(5个字符)和000056
(6个字符)。 15-
应该被分布为三个向量,例如15, 00000
和000000
。等等最终输出应该是这样,
> data1
v1a v1b v1c v2a v2b v2c v3a v3b v3c v4a v4b v4c
1 13 00021 000056 21 00001 000000 15 00000 000000 17 00001 000013
2 13 00005 000025 21 00001 000003 15 00000 000000 17 00011 000055
3 13 00009 000018 21 00000 000000 15 00000 000000 17 00001 000015
有什么想法吗?
答案 0 :(得分:4)
这里使用str_match
和sprintf
分两步进行。首先,我们分割所有内容:
n <- 4 # or str_count(v1, ",")[1] of it's common to all the rows
(M <- str_match(v1, paste0(rep("(\\d{2})(\\d*)-(\\d*)[, ]*", n), collapse = ""))[, -1])
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
# [1,] "13" "21" "56" "21" "" "" "15" "" "" "17" "01" "13"
# [2,] "13" "05" "25" "21" "01" "03" "15" "01" "02" "17" "11" "55"
# [3,] "13" "09" "18" "21" "" "" "15" "01" "04" "17" "01" "15"
提供3 * n
列,然后使用sprintf
格式化矩阵:
matrix(sprintf(c("%02s", "%05s", "%06s"), t(M)), nrow = nrow(M), byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
# [1,] "13" "00021" "000056" "21" "00000" "000000" "15" "00000" "000000" "17" "00001" "000013"
# [2,] "13" "00005" "000025" "21" "00001" "000003" "15" "00001" "000002" "17" "00011" "000055"
# [3,] "13" "00009" "000018" "21" "00000" "000000" "15" "00001" "000004" "17" "00001" "000015"
答案 1 :(得分:3)
假设所有输入子字符串的格式为9999-99,
或99-,
,我们使用一个gsub
将第一种形式转换为三个空格分隔的字段,使用另一个gsub
进行转换第二种形式是三个以空格分隔的字段。最后read.table
从中产生一个数据帧。如果列名无关紧要,则可以忽略col.names=
参数。不使用任何软件包。
s <- gsub("(\\d\\d)(\\d\\d)-(\\d\\d),", "\\1 000\\2 0000\\3", data$v1)
s2 <- gsub("(\\d\\d)-,", "\\1 00000 000000", s)
read.table(text = s2, colClasses = "character",
col.names = paste0("v", rep(1:4, each = 3), letters[1:3]))
给予:
v1a v1b v1c v2a v2b v2c v3a v3b v3c v4a v4b v4c
1 13 00021 000056 21 00000 000000 15 00000 000000 17 00001 000013
2 13 00005 000025 21 00001 000003 15 00001 000002 17 00011 000055
3 13 00009 000018 21 00000 000000 15 00001 000004 17 00001 000015
easy
示例关于easy
示例,请注意,问题中定义<-
的行中的第二个easy
应该是=
。进行此修复,并假设将每个子字符串分成两列,第一列的前两位数字用于下一列,然后:
s <- gsub("(\\d\\d)(\\d*),", "\\1,\\2,", easy$v1)
read.table(text = s, colClasses = "character", sep = ",")[-15]
给予;
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 01 0718 02 03 04 16 05 11 06 07
2 01 0819 02 11 03 22 04 2 05 21 06 2 07 21
3 01 0819 02 1 03 2 04 6 05 1 06 11 07 01