我在R中有关于房屋的数据框。这是一个小样本:
Address Type Rent
Glasgow;Scotland House 1500
High Street;Edinburgh;Scotland Apartment 1000
Dundee;Scotland Apartment 800
South Street;Dundee;Scotland House 900
我想将Address列的最后两个实例拉出到我的数据框中的City和County列中。
我使用mutate和strsplit将此列拆分为:
data<-mutate(dataframe, split_add = strsplit(dataframe$Address, ";")
我现在在我的数据框中有一个新列,类似于以下内容:
split_add
c("Glasgow","Scotland")
c("High Street","Edinburgh","Scotland")
c("Dundee","Scotland")
c("South Street","Dundee","Scotland")
如何将每个矢量观测的最后2个实例提取到“城市”和“县”列中?
我试图: data&lt; -mutate(data,city = split_add [-2])) 认为它将从向量的末尾采取第二个实例 - 但这不起作用。
答案 0 :(得分:2)
使用tidyr::separate()
和fill = "left"
选项可能是您最好的选择......
dataframe <- read.table(header = T, stringsAsFactors = F, text = "
Address Type Rent
Glasgow;Scotland House 1500
'High Street;Edinburgh;Scotland' Apartment 1000
Dundee;Scotland Apartment 800
'South Street;Dundee;Scotland' House 900
")
library(tidyr)
separate(dataframe, Address, into = c("Street", "City", "County"),
sep = ";", fill = "left")
# Street City County Type Rent
# 1 <NA> Glasgow Scotland House 1500
# 2 High Street Edinburgh Scotland Apartment 1000
# 3 <NA> Dundee Scotland Apartment 800
# 4 South Street Dundee Scotland House 900
答案 1 :(得分:1)
我在思考处理这个问题的另一种方法。
1.使用split_add列数据创建数据框
c("Glasgow","Scotland")
c("High Street","Edinburgh","Scotland")
c("Dundee","Scotland")
c("South Street","Dundee","Scotland")
test_data <- data.frame(split_add <- c("Glasgow, Scotland",
"High Street, Edinburgh, Scotland",
"Dundee, Scotland",
"South Street, Dundee, Scotland"),stringsAsFactors = F)
names(test_data) <- "address"
2.从separate()
使用tidyr
分割列
library(tidyr)
new_test <- test_data %>% separate(address,c("c1","c2","c3"), sep=",")
3.使用dplyr
和ifelse()
仅保留最后两列
library(dplyr)
new_test %>%
mutate(city = ifelse(is.na(c3),c1,c2),county = ifelse(is.na(c3),c2,c3)) %>%
select(city,county)
最终数据如下所示。
答案 2 :(得分:-2)
假设您正在使用dplyr
data <- mutate(dataframe, split_add = strsplit(Address, ';'), City = tail(split_add, 2)[1], Country = tail(split_add, 1))