我有一个DF,该DF包含一列具有字母数字值的列。我想将这些值拆分并将其存储在单独的列中。
我有一个数据框,其中有一列带有字母数字值。我想拆分该值并将其存储到新列,如下面的示例所示。
str <-c(“ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”, “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”, “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”, “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”, “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”, “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”)
输出:
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
答案 0 :(得分:2)
使用一行样本输出查找字段宽度。它以4开头,因为样本输出似乎缺少输入的前4个字符。然后在read.fwf
中使用它。如果您确实不希望输入的前4个字符出现在输出中,则将read.fwf
行替换为read.fwf(textConnection(str), widths)[-1]
。不使用任何软件包。
sample.out <- "AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077"
widths <- c(4, sapply(read.table(text = sample.out, as.is = TRUE), nchar))
read.fwf(textConnection(str), widths)
给予:
V1 V2 V3 V4 V5 V6 V7 V8
1 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
2 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
3 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
4 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
5 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
6 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
答案 1 :(得分:0)
一种选择是使用separate
中的tidyverse
library(tidyverse)
tibble(col1 = str) %>%
separate(col1, into = paste0("col", 0:7), c(4, 8, 16, 20, 26, 30, 32)) %>%
select(-1)
# A tibble: 6 x 7
# col1 col2 col3 col4 col5 col6 col7
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#2 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#3 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#4 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#5 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#6 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
或者另一个选择是不包含任何带有base R
的程序包,方法是根据位置创建一个分隔符,然后使用read.csv
读取
read.csv(text = sub("^.{4}(.{4})(.{8})(.{4})(.{6})(.{4})(.{2})(.*)",
"\\1,\\2,\\3,\\4,\\5,\\6,\\7", str), header = FALSE,
stringsAsFactors = FALSE)
# V1 V2 V3 V4 V5 V6 V7
#1 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#2 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#3 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#4 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#5 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#6 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077