如何编写一个从字符向量中的data.frame中搜索名称的for循环?

时间:2016-04-25 15:07:59

标签: r for-loop grep match apply

我有一个包含足球运动员名字的data.frame,例如:

names <- data.frame(id=c(1,2,3,4,5,6,7), 
             year=c('Maradona', 'Cruyff', 'Messi', 'Ronaldo', 'Pele', 'Van Basten', 'Diego'))

> names
  id       year
1  1   Maradona
2  2     Cruyff
3  3      Messi
4  4    Ronaldo
5  5       Pele
6  6 Van Basten
7  7      Diego

我还有6,000个已删除的文本文件,其中包含有关这些足球运动员的故事。这些故事在名为stories的大型向量中存储为6,000个元素。

是否可以编写循环(或应用函数)来搜索每个足球运动员的名字。如果匹配或多个匹配发生,我想记录足球运动员的元素编号和名称。

例如,请考虑stories[1]中的以下文字:

Diego Armando Maradona (born 30 October 1960) is a retired Argentine 
professional footballer. He has served as a manager and coach at other
clubs as well as the national team of Argentina. Many in the sport,
including football writers, former players, current players and 
football fans, regard Maradona as the greatest football player of all
time. He was joint FIFA Player of the 20th Century
with Pele.

理想的data.frame具有以下结构:

> outcome
  element    name1 name2
1       1 Maradona  Pele

是否有人知道如何编写这样的代码,从而生成一个data.frame以获取所有足球运动员的信息?

2 个答案:

答案 0 :(得分:0)

我只是用循环来做,但也许你可以使用应用函数

#Make sure you include stringsAsFactors = F or my code won't work
football_names <- data.frame(id=c(1:7), 
                year=c('Maradona', 'Cruyff', 'Messi', 'Ronaldo', 'Pele', 'Van Basten', 'Diego'),stringsAsFactors = F)


outcome <- data.frame(element=football_names$id)

for (i in 1:nrow(football_names)){
  names_in_story <- football_names$year[football_names$year %in% unlist(strsplit(stories[i],split=" "))]

  for (j in 1:length(names_in_story)){
    outcome[i,j+1] <- names_in_story[j]
  }

}

names(outcome) <- c("element",paste0("name",1:(ncol(outcome)-1)))

答案 1 :(得分:0)

我不会完全忘记你的问题。但您可以尝试使用stringr函数和lapply使用字符串匹配。 我假设您的数据stories是一个列表。 该函数将您在函数中提供的所有名称作为向量查找并计算它们的出现次数。输出再次是一个列表。

foo <- function(x,y) table(unlist(str_match_all(x,paste0(y,collapse = "|"))))

结果

res <- lapply(series, foo,names$year) 

然后你可以合并和总结数据(rowSums()),例如:

Reduce(function(...) merge(..., all=T, by="Var1"), res)