我有这个角色数据集,
var1 <- c("10300010118,",
"1030002,",
"1030003,",
"103000405, 0512,",
"103000612, 0717,",
"10310010618,",
"103100221,",
"103100315,",
"103100412, 0517",
"103100612,0729,",
"14510010517,",
"145100212,",
"1451003,",
"145100465, 0588,",
"145100651, 0777,")
我想将其分为几列,
结果数据应如下所示,
> data
v1 v2 v3
1 1 0300 010118, 02, 03, 0405, 0512, 0612, 0717,
2 1 0310 010618, 0221, 0315, 0412, 0517, 0612, 0729,
3 1 4510 010517, 0212, 03, 0465, 0588, 0651, 0777,
有什么想法吗?
答案 0 :(得分:2)
这里是tidyverse
的一个选项。在位置索引的基础上将data_frame
和'{1}创建为separate
分为三列,在'v3'列的字符串末尾删除,
,并使用gl
将每5行中的行按'grp','v1'和'v2'分组,summarise
'v3'通过paste
将' v3”转换为单个字符串
library(tidyverse)
data_frame(var1) %>%
separate(var1, into = paste0('v', 1:3), sep= c(1, 5)) %>%
mutate(v3 = str_remove(v3, ",$")) %>%
group_by(grp = as.integer(gl(n(), 5, n())), v1, v2) %>%
summarise(v3 = toString(v3)) %>%
ungroup %>%
select(-grp)
# A tibble: 3 x 3
# v1 v2 v3
# <chr> <chr> <chr>
#1 1 0300 010118, 02, 03, 0405, 0512, 0612, 0717
#2 1 0310 010618, 0221, 0315, 0412, 0517, 0612,0729
#3 1 4510 010517, 0212, 03, 0465, 0588, 0651, 0777
或者我们可以在base R
中通过在上述位置创建一个定界符,然后使用read.csv
df1 <- read.table(text= sub("^(.)(.{4})(.*),?$", "\\1-\\2-\\3", var1),
sep="-", header = FALSE, stringsAsFactors = FALSE, col.names = paste0("v", 1:3))
df1$grp <- as.integer(gl(nrow(df1), 5, nrow(df1)))
aggregate(v3 ~ ., df1, FUN = toString)[-3]