我想利用apply
函数来加速代码。
我已经在列表中提取了一系列城市
targetcitylist :=> "London", "Hong Kong", "Dubai", "Paris"
我有一个单独的巨大数据框,看起来像这样
+---------------+------------+-----------+
| Period | City | usercount |
+---------------+------------+-----------+
| Night | Cardiff | 35 |
| Afternoon | Unknown | 12 |
| Afternoon | Norwich | 111 |
| Afternoon | Darlington | 13 |
| Evening | Bebington | 6 |
| Afternoon | Shrewsbury | 24 |
+---------------+------------+-----------+
我想创建一个循环遍历数据框每一行并创建一个新变量Cities的函数,除非它们在列表中,否则它们将被归类为Other
。
这是我的缓慢尝试:
data$Cities <- ifelse(data$City == toString(targetcitylist[1]),toString(targetcitylist[1]),
ifelse(data$City == toString(targetcitylist[2]),toString(targetcitylist[2]),
ifelse(data$City == toString(targetcitylist[3]),toString(targetcitylist[3]),
ifelse(data$City == toString(targetcitylist[4]),toString(targetcitylist[4]),
ifelse(data$City == toString(targetcitylist[5]),toString(targetcitylist[5]),
'Other')))))
这是我尝试加速但尝试失败的原因:
data$Cities = lapply(targetcitylist, function(x)ifelse(data$City==targetcitylist[x] , targetcitylist[x] ,'Other'))
请问您是否可以在优化代码速度的同时简化语法?缓慢的尝试真的很慢。
答案 0 :(得分:2)
尝试以下示例:
#my list
targetcitylist <- c("London", "Hong Kong", "Dubai", "Paris")
#my data - note: only London should match my target list
data <- read.table(text="Period City usercount
Night Cardiff 35
Afternoon Unknown 12
Afternoon London 111
Afternoon Darlington 13
Evening Bebington 6
Afternoon Shrewsbury 24", header = TRUE, as.is = TRUE) #no factors
#result
ifelse(data$City %in% targetcitylist, data$City, "Other")
#output
[1] "Other" "Other" "London" "Other" "Other" "Other"