我有长格式的数据。我想将其转换为宽格式。列映射的逻辑-第一列必须具有单词“ bed”,第二列必须具有单词“ m ^ 2”,第三列必须具有单词“ floor”或“ lift”。
Type <- read.table(header = T, text = "
Attributes
'2 bed'
'197 m²'
'Floor 5 exterior with lift'
'3 bed'
'Ground floor exterior with lift'
'3 bed'
'110 m²'
'195 m²'
'Floor 5 exterior with lift'
'3 bed'
'110 m²'
'5 bed'
")
Type2 <- Type %>%
group_by(grp = cumsum(str_detect(Attributes, '^\\d+\\s*bed$'))) %>%
mutate(colnm = c('BedRoom', 'Size', 'Floor')[row_number()]) %>%
ungroup %>%
pivot_wider(names_from = colnm, values_from = Attributes) %>%
select(-grp)
当“床”值不可用时,以上代码不起作用。
所需的输出
答案 0 :(得分:1)
一个选择是创建一个索引,以使用case_when/str_detect
映射帖子中指定的每个模式。然后,基于索引,我们检查重复索引或相邻索引之间的差异小于或等于0的情况,并创建一个逻辑向量累加和的组。使用“ grp”,我们可以使用pivot_wider
library(stringr)
library(dplyr)
library(tidyr)
Type %>%
mutate(ind = case_when(
str_detect(Attributes, '\\bbed') ~ 1,
str_detect(Attributes, "m²$") ~ 2,
str_detect(Attributes, "\\b(Floor|lift)\\b")~ 3),
grp = cumsum(c(TRUE, diff(ind) <= 0)),
colnm = c('BedRoom', 'Size', 'Floor')[ind]) %>%
select(-ind) %>%
pivot_wider(names_from = colnm, values_from = Attributes) %>%
select(-grp)
# A tibble: 6 x 3
# BedRoom Size Floor
# <chr> <chr> <chr>
#1 2 bed 197 m² Floor 5 exterior with lift
#2 3 bed <NA> Ground floor exterior with lift
#3 3 bed 110 m² <NA>
#4 <NA> 195 m² Floor 5 exterior with lift
#5 3 bed 110 m² <NA>
#6 5 bed <NA> <NA>