Question

在这里，我尝试将数据帧转换为tibble格式，并将年，月列值拆分为各自的行：

library(dpylr)
library(tidyr)

res <- data.frame("year.month" = c("2005M1","2005M2","2005M3","2005M4"), "national houses" = c(100,100,100,100), "dublin houses" = c(120,120,120,120))

res %>% separate(year.month , into=c("year" , "month") ,  sep=".")

返回：

  year month national.houses dublin.houses
1                        100           120
2                        100           120
3                        100           120
4                        100           120
Warning message:
Too many values at 4 locations: 1, 2, 3, 4

年＆amp;月份值没有出现，我没有正确使用？

Answer 1

.是正则表达式中的外卡，sep中的separate参数采用正则表达式，因此您的代码会尝试按每个字符分割year.month，因此警告太多的价值观。以下内容使用正向lookbehind和lookahead正确分隔您的列：

library(dplyr)
library(tidyr)

res %>% 
  separate(year.month, into = c("year", "month"), sep = "(?<=\\d)(?=M)")

您还可以使用extract中的tidyr按捕获组进行拆分：

res %>% 
  extract(year.month, into = c("year", "month"), regex = "(\\d{4})(M\\d)")

<强>结果：

  year month national.houses dublin.houses
1 2005    M1             100           120
2 2005    M2             100           120
3 2005    M3             100           120
4 2005    M4             100           120

Answer 2

我猜，只是逐月分开会让你变得半整齐。你仍然有两个独立的列，都计算房屋。每次观察一行，每个变量一列需要这样的东西：

res %>% 
  tidyr::gather(key = where, 
                value = houses, 
                -year.month) %>% 
  mutate(where = gsub(where, 
                      pattern = '\\.houses', 
                      replacement = '')) %>% 
  separate(year.month, 
           into = c('year', 'month'), 
           sep = 'M')

将数据帧转换为整洁格式

2 个答案: