如何拆分此列' seriesID'分成多列,看起来像下表?基本上我需要将字符串拆分成多个长度为(3,3,6,1,1,3)的字符串。
seriesID
1 ISU111aaaaaa33001
2 ISU222bbbbbb33001
3 ISU000cccccc63001
4 ISU333dddddd63001
seriesID pre supp ind data case area
1 ISU111aaaaaa33001 ISU 111 aaaaaa 3 3 001
2 ISU222bbbbbb33001 ISU 222 bbbbbb 3 3 001
3 ISU000cccccc63001 ISU 000 cccccc 6 3 001
4 ISU333dddddd63001 ISU 333 dddddd 6 3 001
谢谢!
答案 0 :(得分:2)
您还可以使用substr
:
widths = c(3,3,6,1,1,3)
end = cumsum(widths)
start = c(1, head(end, -1) + 1)
as.data.frame(mapply(substr, start, end, MoreArgs = list(x=df$seriesID)))
# V1 V2 V3 V4 V5 V6
#1 ISU 000 000000 3 3 001
#2 ISU 000 000000 3 3 001
#3 ISU 000 000000 6 3 001
#4 ISU 000 000000 6 3 001
答案 1 :(得分:1)
您可以使用readr
将数据“重新读取”为固定的wdith文件。例如
series=c("ISU00000000033001","ISU00000000033001","ISU00000000063001","ISU00000000063001")
read_fwf(paste(series, collapse="\n"), fwf_widths(c(3,3,6,1,1,3)))
# A tibble: 4 × 6
# X1 X2 X3 X4 X5 X6
# <chr> <chr> <chr> <int> <int> <chr>
# 1 ISU 000 000000 3 3 001
# 2 ISU 000 000000 3 3 001
# 3 ISU 000 000000 6 3 001
# 4 ISU 000 000000 6 3 001
请注意,我们将字符串向量折叠为带有换行符的单个字符串,这对于大型向量可能效率低。
答案 2 :(得分:1)
seriesID <- c('ISU00000000033001',
'ISU00000000033001',
'ISU00000000063001',
'ISU00000000063001')
df <- data.frame(pre = substr(seriesID,1,3),
supp =substr(seriesID,4,6),
ind =substr(seriesID,7,12),
data =substr(seriesID,13,13),
case =substr(seriesID,14,14),
area =substr(seriesID,15,17))
df
pre supp ind data case area
1 ISU 000 000000 3 3 001
2 ISU 000 000000 3 3 001
3 ISU 000 000000 6 3 001
4 ISU 000 000000 6 3 001
答案 3 :(得分:1)
您可以使用包separate
中的tidyr
:
df <- data.frame(series=c("ISU00000000033001","ISU00000000033001","ISU00000000063001","ISU00000000063001"), stringsAsFactors=FALSE)
library(tidyr)
df %>%
separate(series,
c("pre", "supp", "ind", "data", "case", "area"),
sep=cumsum(c(3,3,6,1,1)))
pre supp ind data case area
1 ISU 000 000000 3 3 001
2 ISU 000 000000 3 3 001
3 ISU 000 000000 6 3 001
4 ISU 000 000000 6 3 001
答案 4 :(得分:0)
当您使用read.fwf()
:https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.fwf.html之类的读取数据时,听起来应该真正应对此问题。
但要解决问题,请使用substr()
seriesID <- c('ISU00000000033001', 'ISU00000000033001', 'ISU00000000063001', 'ISU00000000063001')
df <- data.frame(seriesID = seriesID,
pre = substr(seriesID, 1, 3),
supp = substr(seriesID, 4, 6),
ind = substr(seriesID, 7, 12),
data = substr(seriesID, 13, 13),
case = substr(seriesID, 14, 14),
area = substr(seriesID, 15, 17))
print(df)
# seriesID pre supp ind data case area
# 1 ISU00000000033001 ISU 000 000000 3 3 001
# 2 ISU00000000033001 ISU 000 000000 3 3 001
# 3 ISU00000000063001 ISU 000 000000 6 3 001
# 4 ISU00000000063001 ISU 000 000000 6 3 001