我对R和基于矩阵的脚本语言相对较新。我已经编写了这个函数来返回每行的索引,其内容类似于任何其他行的内容。这是我正在开发的一种原始形式的垃圾邮件减少。
if (!require("RecordLinkage")) install.packages("RecordLinkage")
library("RecordLinkage")
# Takes a column of strings, returns a list of index's
check_similarity <- function(x) {
threshold <- 0.8
values <- NULL
for(i in 1:length(x)) {
values <- c(values, which(jarowinkler(x[i], x[-i]) > threshold))
}
return(values)
}
有没有办法可以写这个以完全避免for循环?
答案 0 :(得分:1)
我们可以使用# some test data #
x = c('hello', 'hollow', 'cat', 'turtle', 'bottle', 'xxx')
# create an x by x matrix specifying which strings are alike
m = sapply(x, jarowinkler, x) > threshold
# set diagonal to FALSE: we're not interested in strings being identical to themselves
diag(m) = FALSE
# And find index positions of all strings that are similar to at least one other string
which(rowSums(m) > 0)
# [1] 1 2 4 5
稍微简化代码。
which(colSums(m) > 0)
# hello hollow turtle bottle
# 1 2 4 5
即。这会将'hello','hollow','turtle'和'bottle'的索引位置返回为类似于另一个字符串
如果您愿意,可以使用colSums而不是rowSums来获取命名向量,但如果字符串很长,这可能会很麻烦:
while True : #loops input
userAge = raw_input("please enter your age: ")
try :
userAge = int(userAge)
except ValueError :
print ("Please enter an integer you dummy!")
continue
if (userAge > 18) : #outputs based on age input
print "Congrats, you're an adult."
continue
elif userAge <= 0 :
print "You got a computer in that womb?"
continue
elif userAge <= 5 :
print "You're a toddler, get off the computer!"
continue
elif userAge <= 10:
print "You're a child!"
continue
elif userAge <= 12:
print "You're a preteen, go listen to fall out boy."
continue
elif userAge > 12:
print "You're a teen!"
continue
else :
break