我可以进一步矢量化这个功能

时间:2017-02-14 20:09:34

标签: r for-loop vectorization

我对R和基于矩阵的脚本语言相对较新。我已经编写了这个函数来返回每行的索引,其内容类似于任何其他行的内容。这是我正在开发的一种原始形式的垃圾邮件减少。

if (!require("RecordLinkage")) install.packages("RecordLinkage")

library("RecordLinkage")

# Takes a column of strings, returns a list of index's
check_similarity <- function(x) {
  threshold <- 0.8
  values <- NULL
  for(i in 1:length(x)) {
    values <- c(values, which(jarowinkler(x[i], x[-i]) > threshold))
  }
  return(values)
}

有没有办法可以写这个以完全避免for循环?

1 个答案:

答案 0 :(得分:1)

我们可以使用# some test data # x = c('hello', 'hollow', 'cat', 'turtle', 'bottle', 'xxx') # create an x by x matrix specifying which strings are alike m = sapply(x, jarowinkler, x) > threshold # set diagonal to FALSE: we're not interested in strings being identical to themselves diag(m) = FALSE # And find index positions of all strings that are similar to at least one other string which(rowSums(m) > 0) # [1] 1 2 4 5 稍微简化代码。

which(colSums(m) > 0)
# hello hollow turtle bottle 
#     1      2      4      5 

即。这会将'hello','hollow','turtle'和'bottle'的索引位置返回为类似于另一个字符串

如果您愿意,可以使用colSums而不是rowSums来获取命名向量,但如果字符串很长,这可能会很麻烦:

while True : #loops input 
  userAge = raw_input("please enter your age: ")
  try :
      userAge = int(userAge) 
  except ValueError :
      print ("Please enter an integer you dummy!")
      continue
  if  (userAge > 18) : #outputs based on age input 
      print "Congrats, you're an adult."
      continue 
  elif userAge <= 0 :
      print "You got a computer in that womb?"
      continue
  elif userAge <= 5 :   
      print "You're a toddler, get off the computer!"
      continue
  elif userAge <= 10:
      print "You're a child!"
      continue
  elif userAge <= 12:
      print "You're a preteen, go listen to fall out boy."
      continue
  elif userAge > 12:
      print "You're a teen!"
      continue
  else : 
      break