Question

我正在尝试读取R中的CSV文件，并从文件列中查找特定模式并计算它出现的次数。这是代码：

dataframe <- read.csv("path-analysis-2003-a.csv", header = TRUE, stringsAsFactors=FALSE)

for(i in 1:nrow(dataframe))
{
  counter <- gregexpr("-",dataframe$Path[i], fixed = TRUE, useBytes = TRUE)
  print(length(counter))

}

但是输出显示每行的长度为1。当我调试代码时，我发现了这个输出：

[[1]]
 [1] 10 19 28 41 43 44 45 46 50 60 67
attr(,"match.length")
 [1] 1 1 1 1 1 1 1 1 1 1 1
attr(,"useBytes")
[1] TRUE

输出的第一行（给出位置）很有用，因为我可以从那里计算出现。但问题是我不知道如何摆脱其他输出信息。有什么建议吗？

Answer 1

以下是您可以关注的示例。我在代码中添加了注释以使其自我解释。该示例显示在包含4个句子的数据框中搜索单词stop。

# some data for the demo
text <- c("Because I could not stop for Death -",
"He kindly stopped for me -",
"The Carriage held but just Ourselves -",
"and Immortality")
# populate sample dataframe
df_sample <- data.frame(id=1:4, sentence=text)
# apply gregexpr, note the function is vectorized no need of loop
result <- gregexpr("stop", df_sample$sentence)
# unlist result to obtain the indices
final <- unlist(result)
# print results
final

删除R中gregexpr的额外信息

1 个答案: