我有一些数据如下所示
date over bed.bath
1 2016-03-17 -0.002352941 1 bed 1 bath
2 2016-03-17 -0.035294118 1 bed 1 bath
3 2016-03-17 -0.008278717 1 bed 1 bath
4 2016-03-17 -0.008350731 1 bed 1 bath
5 2016-03-17 0.004243281 1 bed 2 bath
6 2016-03-17 0.007299270 2 bed 2 bat
bed.bath
列是character
。我想分别提取有关床和浴的信息。我试过拆分字符串并提取出这样的数字
getbeds <- function(x){
splits = strsplit(x," ")
return(splits[[1]][1])
}
但是,当我使用df<- df%>% mutate(beds = getbeds(bed.bath))
时,新列只有1秒。
date over bed.bath beds
1 2016-03-17 -0.002352941 1 bed 1 bath 1
2 2016-03-17 -0.035294118 1 bed 1 bath 1
3 2016-03-17 -0.008278717 1 bed 1 bath 1
4 2016-03-17 -0.008350731 1 bed 1 bath 1
5 2016-03-17 0.004243281 1 bed 2 bath 1
6 2016-03-17 0.007299270 2 bed 2 bath 1
从数据框中提取我喜欢的信息的最佳方法是什么?
数据
df <- structure(list(date = structure(c(16877, 16877, 16877, 16877, 16877, 16877), class = "Date"),
over = c(-0.002352941, -0.035294118, -0.008278717, -0.008350731, 0.004243281, 0.00729927),
bed.bath = c("1 bed 1 bath", "1 bed 1 bath", "1 bed 1 bath", "1 bed 1 bath", "1 bed 2 bath", "2 bed 2 bath")),
.Names = c("date", "over", "bed.bath"),
row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
library('dplyr')
df %>% mutate(beds = getbeds(bed.bath))
答案 0 :(得分:4)
我们可以使用extract
tidyr
library(tidyr)
library(dplyr)
df %>%
extract(bed.bath, into = 'beds', "(\\d+).*", remove = FALSE)
或base R
使用sub
匹配一个或多个空格(\\s+
)后跟字符(.*
)并将其替换为空格以便我们获取数字在字符串的开头,所有其他字符都被删除。
df$beds <- with(df, as.integer(sub("\\s+.*", "", bed.bath)))
OP输出中相同值的原因是因为它仅从第一个[1]
元素(list
)中提取第一个观察点([[1]]
)
答案 1 :(得分:1)
If you also want to extract the number of baths, you could use sapply:
getbeds <- function(x){
splits = strsplit(x," ")
as.integer( c(splits[[1]][[1]],splits[[1]][[3]]) )
}
bed.bath <- t(sapply(df$bed.bath,getbeds))
getbeds <- function(x){
splits = strsplit(x," ")
c(splits[[1]][[1]],splits[[1]][[3]])
}
bed.bath <- t(sapply(df$bed.bath,getbeds))
df$bed <- bed.bath[,1]
df$bath <- bed.bath[,2]
df
# date over bed.bath bed bath
#1 2016-03-17 -0.002352941 1 bed 1 bath 1 1
#2 2016-03-17 -0.035294118 1 bed 1 bath 1 1
#3 2016-03-17 -0.008278717 1 bed 1 bath 1 1
#4 2016-03-17 -0.008350731 1 bed 1 bath 1 1
#5 2016-03-17 0.004243281 1 bed 2 bath 1 2