基于另一列更改列值,但仅针对第一和第二列(R)中的某些条件

时间:2018-10-24 11:08:33

标签: r dataframe grepl

我有一个数据框。

city <- as.character(c("London", "Unknown", "Birmingham", "Bristol", "Unknown", "Unknown", "Unknown", "Unknown"))
city_details <- as.character(c("London", "Camden", "Birmingham", "Outside London", "Camden Town", "Westminster", "London", "Birmingham"))
city_data <- data.frame(city, city_details)

尽管city列中的几个值是未知的,但查看city_details可以发现其中大多数实际上位于伦敦。

因此,我可以替换其中一些:

city_data$city[grepl("Camden|Westminster", city_data$city_details)] <- 'London'

但是,在city_details中用“ London”标记的情况会更困难,因为还有一个在“伦敦郊外”的标记,所以我不想只捡起带有“ London”的内容模式。

为此,我不是在寻找仅包含完全匹配项的方法(因为这对于我的真实数据而言并不完全正确)。

所以我要做的是仅对未知的城市值执行此替换。

目前,我已经尝试了以下方法,但是显然逻辑是不对的,因为它实际上是在将city列中的所有未知值都更改为London。

city_data <- within(city_data, city[city == "Unknown"] <- (city[grepl("London", city_details)] <- 'London'))

有人可以帮忙吗?

3 个答案:

答案 0 :(得分:1)

我假设您仅在city未知且city_details提及“伦敦”时才想替换城市名称。在这种情况下,您可以使用以下内容:

city_data$city[(as.numeric(grepl("Unknown", city)) + as.numeric(grepl("London", city_details))) == 2] <- "London"

这能回答您的问题吗?

答案 1 :(得分:0)

我建议以下内容:

one_hot <- grepl("Camden|Westminster|London", city_data$city_details) &
  city_data$city == "Unknown"
city_data$city[one_hot] <- "London"

示例:

city <- as.character(c("London", "Unknown", "Birmingham", "Bristol", "Unknown", "Unknown", "Unknown", "Unknown"))
city_details <- as.character(c("London", "Camden", "Birmingham", "Outside London", "Camden Town", "Westminster", "London", "Tottenham"))
city_data <- data.frame(city, city_details)

> city_data
        city   city_details
1     London         London
2    Unknown         Camden
3 Birmingham     Birmingham
4    Bristol Outside London
5    Unknown    Camden Town
6    Unknown    Westminster
7    Unknown         London
8    Unknown      Tottenham

> one_hot <- grepl("Camden|Westminster|London", city_data$city_details) &
+   city_data$city == "Unknown"
> city_data$city[one_hot] <- "London"
> city_data
        city   city_details
1     London         London
2     London         Camden
3 Birmingham     Birmingham
4    Bristol Outside London
5     London    Camden Town
6     London    Westminster
7     London         London
8    Unknown      Tottenham

答案 2 :(得分:0)

我还制定了以下方法,这对我来说似乎更整洁,更直观。无需转换为数字。

city_data$city[grepl("Unknown", city_data$city) & 
               grepl("London|Camden|Westminster", city_data$city_details)] <- "London"