在这里,我尝试将数据帧转换为tibble格式,并将年,月列值拆分为各自的行:
library(dpylr)
library(tidyr)
res <- data.frame("year.month" = c("2005M1","2005M2","2005M3","2005M4"), "national houses" = c(100,100,100,100), "dublin houses" = c(120,120,120,120))
res %>% separate(year.month , into=c("year" , "month") , sep=".")
返回:
year month national.houses dublin.houses
1 100 120
2 100 120
3 100 120
4 100 120
Warning message:
Too many values at 4 locations: 1, 2, 3, 4
年&amp;月份值没有出现,我没有正确使用?
答案 0 :(得分:1)
.
是正则表达式中的外卡,sep
中的separate
参数采用正则表达式,因此您的代码会尝试按每个字符分割year.month
,因此警告太多的价值观。以下内容使用正向lookbehind和lookahead正确分隔您的列:
library(dplyr)
library(tidyr)
res %>%
separate(year.month, into = c("year", "month"), sep = "(?<=\\d)(?=M)")
您还可以使用extract
中的tidyr
按捕获组进行拆分:
res %>%
extract(year.month, into = c("year", "month"), regex = "(\\d{4})(M\\d)")
<强>结果:强>
year month national.houses dublin.houses
1 2005 M1 100 120
2 2005 M2 100 120
3 2005 M3 100 120
4 2005 M4 100 120
答案 1 :(得分:1)
我猜,只是逐月分开会让你变得半整齐。 你仍然有两个独立的列,都计算房屋。每次观察一行,每个变量一列需要这样的东西:
res %>%
tidyr::gather(key = where,
value = houses,
-year.month) %>%
mutate(where = gsub(where,
pattern = '\\.houses',
replacement = '')) %>%
separate(year.month,
into = c('year', 'month'),
sep = 'M')