我现在正在学习R,并且我无法以有效的方式循环使用R,尽管我可以使用for循环以非常复杂的方式进行字符串解析,但我对如何编写字符串解析代码感到困惑矢量化的方式。
例如
#Social security numbers in the United States are represented by
# numbers conforming to the following format:
#
# a leading 0 followed by two digits
# followed by a dash
# followed by two digits
# followed by a dash
# finally followed by four digits
#
# For example 023-45-7890 would be a valid value,
# but 05-09-1995 and 059-2-27 would not be.
#
# Implement the body of the function 'extractSecuNum' below so that it
# returns a numeric vector whose elements are Social Security numbers
# extracted from a text, i.e., a vector of strings representing the text lines,
# passed to the function as its 'text' argument.
# (You can assume that each string in 'text' contains
# either zero or one Social Security numbers.)
extractSecuNum = function(text){
# Write your code here!
x = 1:length(text)
list_of_input = rep(0, length(text))
for (ind in x){
list_of_input[ind] = sub(' .*', '', sub('^[^0-9]*', '', text[ind]))
}
temp = c()
for (ind in x){
if(list_of_input[ind] != ''){
temp = c(temp, list_of_input[ind])
}
}
temp2 = c()
for (ind in 1:length(temp)){
temp3 = strsplit(temp[ind], '-')
temp2 = c(temp2, temp3)
}
final = c()
for(ind in 1:length(temp2)){
if (sub('0[0-9][0-9]', '', temp2[[ind]][1]) == ''){
if (sub('[0-9][0-9]', '', temp2[[ind]][2]) == ''){
if (sub('[0-9]{4}', '', temp2[[ind]][3]) == '')
{ final = c(final, paste(temp2[[ind]][1], temp2[[ind]][2], temp2[[ind]][3], sep='-')) }
}
}
}
return(final)
}
这些是类似问题的其他问题,如果你研究一下,你会发现第二个问题是非常复杂和不优雅
https://gist.github.com/anonymous/c1c68121323af19c766c
我认为问题是R中的原子变量是一个数组,我无法访问字符串中的字符
任何建议都将不胜感激
答案 0 :(得分:1)
extractSecuNum = function(text){
pattern <- "0\\d{2}-\\d{3}-\\d{4}"
unlist(regmatches(text,gregexpr(pattern,text)))
}
text <- paste0("fdkmsal ",
"0",sample(10:99,10),"-",
sample(100:999,10),"-",
sample(1000:9999,10), " vaklra")
result <- extractSecuNum(text)
head(text)
# [1] "fdkmsal 034-965-3362 vaklra" "fdkmsal 029-190-2488 vaklra"
# [3] "fdkmsal 055-785-3898 vaklra" "fdkmsal 033-950-5589 vaklra"
# [5] "fdkmsal 025-833-9312 vaklra" "fdkmsal 054-375-5596 vaklra"
result
# [1] "034-965-3362" "029-190-2488" "055-785-3898" "033-950-5589" "025-833-9312"
# [6] "054-375-5596" "057-680-3317" "020-951-1417" "031-996-4757" "068-402-8678"