从列表

时间:2017-08-21 13:05:27

标签: r dataframe match

这是一个数据帧/ tibble和一个字符元素(此元素是一个tibble的列)

df1 <- structure(list(Twitter_name = c("CHESHIREKlD", "JellyComons", 
"kirmiziburunlu", "erkekdeyimleri", "herosFrance", "IkishanShah"
), Declared_followers = c(60500L, 43100L, 31617L, 27852L, 26312L, 
16021L), Real_followers = c(60241, 43054, 31073, 27853, 25736, 
15856), Twitter_Id = c("783866366", "1424086592", "2367932244", 
"3352977681", "2580703352", "521094407")), .Names = c("Twitter_name", 
"Declared_followers", "Real_followers", "Twitter_Id"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

myId <- c("867211097882804224", "868806957133688832",   "549124465","822580282452754432",  
"109344546", "482666188", "61716107", "3642392237", "595318933", 
"833365943044628480", "1045015087", "859830740669800448", "860562940059045888", 
"2854457294", "871784135983067136", "866922354554814464", "4839343547", 
"849451474572759040", "872084673526214656", "794841530053853184")           

N:B df1已被缩短,确实有128个观察结果。
我希望测试df1$Twitter_Id的所有行元素,看看它们是否在myId中。我可以运行这个:

 > match(myId[1], df1$Twitter_Id)

但:

  1. 在第一次出现时停止
  2. 我需要将match()函数应用于myId的所有元素。
  3. 使用lapply()dplyrtydiverse个软件包中的其他功能,我找不到干净简单的方法。

    感谢您的帮助。

    编辑我需要对整个案例更加明确。

    myTw <- structure(list(id_str = c("893445199661330433", "893116842558050304", 
    "892739336466305024", "892401780105019393", "892401594272296963", 
    "892365572486430720", "891964139756818432")), .Names = "id_str", row.names = c(NA, 
    -7L), class = c("tbl_df", "tbl", "data.frame"))
    

    这些是推文ID。我正在寻找的是获取哪些推特用户转发了这些推特。为此,我使用包retweeters()中的twitteR函数。

    library(twitteR)
    MyRtw <- retweeters(myTw[1])
    
    MyRtw <- c("889135428028084224", "867211097882804224", "868806957133688832", 
    "549124465", "822580282452754432", "109344546", "482666188", 
    "61716107", "3642392237", "595318933", "833365943044628480", 
    "1045015087", "859830740669800448", "860562940059045888", "2854457294", 
    "871784135983067136", "866922354554814464", "4839343547", "849451474572759040", 
    "872084673526214656")
    

    这是Twitter用户ID的列表。 最后,我想看看df1$Twitte_Id的哪些用户转发了MyTw[1]

1 个答案:

答案 0 :(得分:1)

您可以使用&#39;%in%&#39;操作

编辑:可能这就是你想要的。在这里,我使用了原始帖子中发布的数据(在编辑之前)。

matchVector = NULL
for (id in df1$Twitter_Id) {
  matchCounter <- sum(myId %in% id)  
  matchVector <- c(matchVector, matchCounter)
}

df1$numberOfMatches <- matchVector