Question

我在数据表中有一列日期，以6位数字输入：201401, 201402, 201403, 201412, etc.其中前4位是年份，后两位是月份。

我试图将该列拆分为两列，一列称为＆＃34;年＆＃34;一个叫做＃34;月＆＃34;。一直在弄strsplit()，但无法弄清楚如何让它做多个字符而不是字符串模式，即分成第4和第5位的中间。

Answer 1

不使用任何外部包，我们可以使用substr

执行此操作

transform(df1, Year = substr(dates, 1, 4), Month = substr(dates, 5, 6))
#    dates Year Month
#1  201401 2014    01
#2  201402 2014    02
#3  201403 2014    03
#4  201412 2014    12

我们可以选择删除或保留列。

或sub

cbind(df1, read.csv(text=sub('(.{4})(.{2})', "\\1,\\2", df1$dates), header=FALSE))

或使用一些包解决方案

library(tidyr)
extract(df1, dates, into = c("Year", "Month"), "(.{4})(.{2})", remove=FALSE)

或使用data.table

library(data.table)
setDT(df1)[, tstrsplit(dates, "(?<=.{4})", perl = TRUE)]

Answer 2

tidyr::separate可以为其sep参数取整数，该参数将在特定位置拆分：

library(tidyr)

df <- data.frame(date = c(201401, 201402, 201403, 201412))

df %>% separate(date, into = c('year', 'month'), sep = 4)
#>   year month
#> 1 2014    01
#> 2 2014    02
#> 3 2014    03
#> 4 2014    12

注意新列是字符;添加convert = TRUE以强制回到数字。

按字符数拆分列

2 个答案: