将数据帧转换为整洁格式

时间:2017-11-29 19:03:06

标签: r

在这里,我尝试将数据帧转换为tibble格式,并将年,月列值拆分为各自的行:

library(dpylr)
library(tidyr)

res <- data.frame("year.month" = c("2005M1","2005M2","2005M3","2005M4"), "national houses" = c(100,100,100,100), "dublin houses" = c(120,120,120,120))

res %>% separate(year.month , into=c("year" , "month") ,  sep=".")

返回:

  year month national.houses dublin.houses
1                        100           120
2                        100           120
3                        100           120
4                        100           120
Warning message:
Too many values at 4 locations: 1, 2, 3, 4 

年&amp;月份值没有出现,我没有正确使用?

2 个答案:

答案 0 :(得分:1)

.是正则表达式中的外卡,sep中的separate参数采用正则表达式,因此您的代码会尝试按每个字符分割year.month,因此警告太多的价值观。以下内容使用正向lookbehind和lookahead正确分隔您的列:

library(dplyr)
library(tidyr)

res %>% 
  separate(year.month, into = c("year", "month"), sep = "(?<=\\d)(?=M)")

您还可以使用extract中的tidyr按捕获组进行拆分:

res %>% 
  extract(year.month, into = c("year", "month"), regex = "(\\d{4})(M\\d)")

<强>结果:

  year month national.houses dublin.houses
1 2005    M1             100           120
2 2005    M2             100           120
3 2005    M3             100           120
4 2005    M4             100           120

答案 1 :(得分:1)

我猜,只是逐月分开会让你变得半整齐。 你仍然有两个独立的列,都计算房屋。每次观察一行,每个变量一列需要这样的东西:

res %>% 
  tidyr::gather(key = where, 
                value = houses, 
                -year.month) %>% 
  mutate(where = gsub(where, 
                      pattern = '\\.houses', 
                      replacement = '')) %>% 
  separate(year.month, 
           into = c('year', 'month'), 
           sep = 'M')