请考虑以下矢量“电话”,其中包含来自“堪萨斯”,“德克萨斯”和“纽约”地区的电话号码。
电话<-c(“ 510-548-2238”,“ 707-231-2440”,“ 650-752-1300”,“ 510-674-3482”,“ 510-853-5695”,“ 510 -882-9898”,“ 650-555-6311”,“ 707-885-6351”,“ 650-231-1234”,“ 650-096-0023”,“ 707-691-6763”)
If the number starts with 510, the phone number is from “KANSAS”, if it is 707, then “NEW YORK” and if it is 650 then the number is from “TEXAS”
Use R concepts and obtain the following dataframe as ouput.
Expected Output:
PhoneNumbers State
1 5105482238 KANSAS
2 7072312440 NEW YORK
3 6507521300 TEXAS
4 5106743482 KANSAS
5 5108535695 KANSAS
6 5108829898 KANSAS
7 6505556311 TEXAS
8 7078856351 NEW YORK
9 6502311234 TEXAS
10 6500960023 TEXAS
11 7076916763 NEW YORK
This is my code :
z<-substr(tels,1,3)
dirt<-data.frame(tels,z)
dirt
for(i in z){
if(i==510){
sta<-"ddfdd"
}if(i==707){
sta<-"NEW YORK"
}
if((i==650)){
sta<-"TEXAS"
}
}
das<-data.frame(tels,sta)
das
but I'm getting this output:
tels sta
1 510-548-2238 NEW YORK
2 707-231-2440 NEW YORK
3 650-752-1300 NEW YORK
4 510-674-3482 NEW YORK
5 510-853-5695 NEW YORK
6 510-882-9898 NEW YORK
7 650-555-6311 NEW YORK
8 707-885-6351 NEW YORK
9 650-231-1234 NEW YORK
10 650-096-0023 NEW YORK
11 707-691-6763 NEW YORK
答案 0 :(得分:2)
您可以使用factor
,标签为state
,标签为前3位数字
data.frame(tels,
state = factor(substr(tels,0,3), c('510','650','707'), c('KANSAS','TEXAS','NEW YORK')))
tels state
1 510-548-2238 KANSAS
2 707-231-2440 NEW YORK
3 650-752-1300 TEXAS
4 510-674-3482 KANSAS
5 510-853-5695 KANSAS
6 510-882-9898 KANSAS
7 650-555-6311 TEXAS
8 707-885-6351 NEW YORK
9 650-231-1234 TEXAS
10 650-096-0023 TEXAS
11 707-691-6763 NEW YORK
答案 1 :(得分:1)
我们substr
'电话',然后创建一个命名的vector
以匹配substr值,并将其替换为命名向量中的值
data.frame(PhoneNumbers = tels, state = setNames(c("KANSAS", "NEW YORK", "TEXAS"),
c('510', '707', '650'))[substr(tels, 1, 3)])
# PhoneNumbers state
#1 510-548-2238 KANSAS
#2 707-231-2440 NEW YORK
#3 650-752-1300 TEXAS
#4 510-674-3482 KANSAS
#5 510-853-5695 KANSAS
#6 510-882-9898 KANSAS
#7 650-555-6311 TEXAS
#8 707-885-6351 NEW YORK
#9 650-231-1234 TEXAS
#10 650-096-0023 TEXAS
#11 707-691-6763 NEW YORK
答案 2 :(得分:1)
您可以使用^510
,^650
和^707
找到第一个模式。为了轻松添加新列,我使用了dplyr
包。
library(tidyverse) # has dplyr and stringr
# data set -------------------------------
(dirt <- data_frame(PhoneNumbers = c("510-548-2238", "707-231-2440", "650-752-1300", "510-674-3482", "510-853-5695", "510-882-9898", "650-555-6311", "707-885-6351", "650-231-1234", "650-096-0023", "707-691-6763")))
#> # A tibble: 11 x 1
#> PhoneNumbers
#> <chr>
#> 1 510-548-2238
#> 2 707-231-2440
#> 3 650-752-1300
#> 4 510-674-3482
#> 5 510-853-5695
#> 6 510-882-9898
#> 7 650-555-6311
#> 8 707-885-6351
#> 9 650-231-1234
#> 10 650-096-0023
#> 11 707-691-6763
您可以使函数通过找到每种模式来找到每个区域:stringr::str_detect()
您可以使用sapply()
立即进行操作。如果执行str_detect
至c("^510", "^650", "^707")
,将得到一个矩阵,每一列都是数字。每个值都是数字是否包含模式(TRUE
或FALSE
),即11 x 3。
对于每一行,构造只有一个TRUE
。您可以找到该索引和子集c("KANSAS", "TEXAS", "NEW YORK")
。
find_region <- function(x) {
sta <- c("^510", "^650", "^707")
stt <- sapply(sta, function(p) {
str_detect(x, pattern = p)
}) %>% # produce matrix 11x3 of TRUE and FALSE, each column = 510, 650, 707, TRUE if x contains the pattern
apply(1, which) # get the index
c("KANSAS", "TEXAS", "NEW YORK")[stt]
}
使用此功能,您可以添加新列:dplyr::mutate()
dirt %>%
mutate(State = find_region(PhoneNumbers))
#> # A tibble: 11 x 2
#> PhoneNumbers State
#> <chr> <chr>
#> 1 510-548-2238 KANSAS
#> 2 707-231-2440 NEW YORK
#> 3 650-752-1300 TEXAS
#> 4 510-674-3482 KANSAS
#> 5 510-853-5695 KANSAS
#> 6 510-882-9898 KANSAS
#> 7 650-555-6311 TEXAS
#> 8 707-885-6351 NEW YORK
#> 9 650-231-1234 TEXAS
#> 10 650-096-0023 TEXAS
#> 11 707-691-6763 NEW YORK