Question

我是R.的新手。在我的数据集中，我有一个名为Reason的变量。我想创建一个名为Price的新列。如果满足以下任何条件：

单词“Price”和单词“High”都在Reason中提及，它们之间的距离小于6个单词
单词“Price”和单词“expensive”都在Reason中提及，它们之间的距离小于6个单词 -word“Price”和单词“increase”都在Reason中提及，它们之间的距离小于6个单词比价格= 1。否则，price = 0。

我找到了以下用户定义的函数来获取2个单词之间的距离

distance <- function(string, term1, term2) {
  words <- strsplit(string, "\\s")[[1]]
  indices <- 1:length(words)
  names(indices) <- words
  abs(indices[term1] - indices[term2])
}

但我不知道如何将整个列应用于获得预期结果。我尝试了以下代码，但它只给了我“logical（0）”作为结果。

for (j in seq(Survey$Reason))
{
  Survey$Price[[j]]<- distance(Survey$Reason[[j]], " price ", " high ") <=6

}

非常感谢任何帮助。感谢

Answer 1

从您的示例数据开始：

->

首先，我更新了你的功能以删除标点符号并直接返回你的位置测试

survey <- structure(list(Reason = c("Their price are extremely high.", "Because my price was increased so much, I wouldn't want anyone else to have to deal with that.", "Just because the intial workings were fine, but after we realised it would affect our contract, it left a sour taste in our mouth.", "Problems with the repair", "They did not handle my complaint as well I would have liked.", "Bad service overall.")), .Names = "Reason", row.names = c(NA, 6L), class = "data.frame")

然后我们申请：

distanceOK <- function(string, term1, term2,n=6) {
  words <- strsplit(gsub("[[:punct:]]", "", string), "\\s")[[1]]
  indices <- 1:length(words)
  names(indices) <- words
  dist <- abs(indices[term1] - indices[term2])
  ifelse(is.na(dist)|dist>n,0,1)
}

根据另一个变量中两个单词之间的距离，给出一个新的变量值0或1

1 个答案: