将ifelse函数应用于列表上的数据框

时间:2015-11-19 11:08:44

标签: r if-statement lapply

我想利用apply函数来加速代码。

我已经在列表中提取了一系列城市

targetcitylist :=> "London", "Hong Kong", "Dubai", "Paris"

我有一个单独的巨大数据框,看起来像这样

+---------------+------------+-----------+
|      Period   |    City    | usercount |
+---------------+------------+-----------+
|     Night     | Cardiff    |        35 |
|     Afternoon | Unknown    |        12 |
|     Afternoon | Norwich    |       111 |
|     Afternoon | Darlington |        13 |
|     Evening   | Bebington  |         6 |
|     Afternoon | Shrewsbury |        24 |
+---------------+------------+-----------+

我想创建一个循环遍历数据框每一行并创建一个新变量Cities的函数,除非它们在列表中,否则它们将被归类为Other

这是我的缓慢尝试:

data$Cities <- ifelse(data$City == toString(targetcitylist[1]),toString(targetcitylist[1]), 
                            ifelse(data$City == toString(targetcitylist[2]),toString(targetcitylist[2]),
                                   ifelse(data$City == toString(targetcitylist[3]),toString(targetcitylist[3]),
                                          ifelse(data$City == toString(targetcitylist[4]),toString(targetcitylist[4]),
                                                 ifelse(data$City == toString(targetcitylist[5]),toString(targetcitylist[5]),
                                                        'Other')))))

这是我尝试加速但尝试失败的原因:

data$Cities = lapply(targetcitylist, function(x)ifelse(data$City==targetcitylist[x] , targetcitylist[x] ,'Other'))

请问您是否可以在优化代码速度的同时简化语法?缓慢的尝试真的很慢。

1 个答案:

答案 0 :(得分:2)

尝试以下示例:

#my list
targetcitylist <- c("London", "Hong Kong", "Dubai", "Paris")

#my data - note: only London should match my target list
data <- read.table(text="Period City usercount 
Night Cardiff 35 
Afternoon Unknown 12 
Afternoon London 111 
Afternoon Darlington 13 
Evening Bebington 6 
Afternoon Shrewsbury 24", header = TRUE, as.is = TRUE) #no factors

#result
ifelse(data$City %in% targetcitylist, data$City, "Other")

#output
[1] "Other"  "Other"  "London" "Other"  "Other"  "Other"