Question

很高兴将答案分数奖励给可以帮助我完成此过程的人。我想搜索字符串是否缺少城市名称并在丢失的城市上显示如果它确实缺失则命名。

假设我有这样的数据：

JSonInfo

我喜欢这样的数据：

df <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr", "16221 north freeway"))

我目前的方法对大型数据集的效率非常低，我确定存在矢量化。有人可以协助这个循环的矢量化吗？：

df.desired <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st, houston, tx", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr, houston, tx", "16221 north freeway, houston, tx"))

提前致谢！

Answer 1

我们不是遍历每一行，而是使用grep（vectorized）创建一个逻辑索引，然后分配与＃39; Houston.Addresses＆＃39;相对应的元素。 index＆＃39; i1＆＃39; （通过character子串

转换为paste类之后）

i1 <- !grepl("houston", tolower(df$Houston.Addresses))
df$Houston.Addresses <- as.character(df$Houston.Addresses)
df$Houston.Addresses[i1] <- paste0(df$Houston.Addresses[i1], ", houston, tx")

如果我们想提高效率，我们可以使用data.table进行分配（:=）

library(data.table)
setDT(df)[, Houston.Addresses := as.character(Houston.Addresses)
            ][!grepl("houston", tolower(Houston.Addresses)),
                 Houston.Addresses := paste0(Houston.Addresses, ", houston, tx")]

Answer 2

另一个建议使用ifelse

df$Houston.Addresses <- ifelse(grepl("houston", df$Houston.Addresses, ignore.case=TRUE), 
    paste0(df$Houston.Addresses, ", Houston, TX"), 
    df$Houston.Addresses)

有没有办法在R中对此foreach循环进行矢量化以使文本替换更有效？

2 个答案: