Question

我在R中有关于房屋的数据框。这是一个小样本：

Address                              Type       Rent
Glasgow;Scotland                     House      1500
High Street;Edinburgh;Scotland      Apartment    1000
Dundee;Scotland                     Apartment    800
South Street;Dundee;Scotland        House       900

我想将Address列的最后两个实例拉出到我的数据框中的City和County列中。

我使用mutate和strsplit将此列拆分为：

data<-mutate(dataframe, split_add = strsplit(dataframe$Address, ";")

我现在在我的数据框中有一个新列，类似于以下内容：

split_add                             
c("Glasgow","Scotland")                     
c("High Street","Edinburgh","Scotland")      
c("Dundee","Scotland")                    
c("South Street","Dundee","Scotland")

如何将每个矢量观测的最后2个实例提取到“城市”和“县”列中？

我试图： data＆lt; -mutate（data，city = split_add [-2]））认为它将从向量的末尾采取第二个实例 - 但这不起作用。

Answer 1

使用tidyr::separate()和fill = "left"选项可能是您最好的选择......

dataframe <- read.table(header = T, stringsAsFactors = F, text = "
Address                          Type       Rent
Glasgow;Scotland                 House      1500
'High Street;Edinburgh;Scotland' Apartment  1000
Dundee;Scotland                  Apartment  800
'South Street;Dundee;Scotland'   House      900
")

library(tidyr)

separate(dataframe, Address, into = c("Street", "City", "County"), 
         sep = ";", fill = "left")

#         Street      City   County      Type Rent
# 1         <NA>   Glasgow Scotland     House 1500
# 2  High Street Edinburgh Scotland Apartment 1000
# 3         <NA>    Dundee Scotland Apartment  800
# 4 South Street    Dundee Scotland     House  900

Answer 2

我在思考处理这个问题的另一种方法。

1.使用split_add列数据创建数据框

c("Glasgow","Scotland")                      
c("High Street","Edinburgh","Scotland")      
c("Dundee","Scotland")                    
c("South Street","Dundee","Scotland")  

test_data <- data.frame(split_add <- c("Glasgow, Scotland",                     
                          "High Street, Edinburgh, Scotland",      
                          "Dundee, Scotland",                    
                          "South Street, Dundee, Scotland"),stringsAsFactors = F)
names(test_data) <- "address"

2.从separate()使用tidyr分割列

library(tidyr)

new_test <- test_data %>% separate(address,c("c1","c2","c3"), sep=",")

3.使用dplyr和ifelse()仅保留最后两列

library(dplyr)
new_test %>% 
  mutate(city = ifelse(is.na(c3),c1,c2),county = ifelse(is.na(c3),c2,c3)) %>% 
  select(city,county)

最终数据如下所示。

Answer 3

假设您正在使用dplyr

data <- mutate(dataframe, split_add = strsplit(Address, ';'), City = tail(split_add, 2)[1], Country = tail(split_add, 1))

R观察strs拆分 - 列中的多个值

3 个答案: