根据另一个变量中两个单词之间的距离,给出一个新的变量值0或1

时间:2017-02-14 10:25:08

标签: r

我是R.的新手。在我的数据集中,我有一个名为Reason的变量。我想创建一个名为Price的新列。如果满足以下任何条件:

  • 单词“Price”和单词“High”都在Reason中提及,它们之间的距离小于6个单词
  • 单词“Price”和单词“expensive”都在Reason中提及,它们之间的距离小于6个单词 -word“Price”和单词“increase”都在Reason中提及,它们之间的距离小于6个单词 比价格= 1。否则,price = 0。

我找到了以下用户定义的函数来获取2个单词之间的距离

distance <- function(string, term1, term2) {
  words <- strsplit(string, "\\s")[[1]]
  indices <- 1:length(words)
  names(indices) <- words
  abs(indices[term1] - indices[term2])
}

但我不知道如何将整个列应用于获得预期结果。我尝试了以下代码,但它只给了我“logical(0)”作为结果。

for (j in seq(Survey$Reason))
{
  Survey$Price[[j]]<- distance(Survey$Reason[[j]], " price ", " high ") <=6

} 

非常感谢任何帮助。 感谢

1 个答案:

答案 0 :(得分:2)

从您的示例数据开始:

->

首先,我更新了你的功能以删除标点符号并直接返回你的位置测试

survey <- structure(list(Reason = c("Their price are extremely high.", "Because my price was increased so much, I wouldn't want anyone else to have to deal with that.", "Just because the intial workings were fine, but after we realised it would affect our contract, it left a sour taste in our mouth.", "Problems with the repair", "They did not handle my complaint as well I would have liked.", "Bad service overall.")), .Names = "Reason", row.names = c(NA, 6L), class = "data.frame")

然后我们申请:

distanceOK <- function(string, term1, term2,n=6) {
  words <- strsplit(gsub("[[:punct:]]", "", string), "\\s")[[1]]
  indices <- 1:length(words)
  names(indices) <- words
  dist <- abs(indices[term1] - indices[term2])
  ifelse(is.na(dist)|dist>n,0,1)
}