Question

我使用r包twitteR从Twitter检索了许多推文。

在我成功完成此操作后，我的目标是根据这些推文中的提及为网络分析创建边缘。为此，我使用以下代码来获取推文中提到的Twitter用户名：

tweets <- read.csv(file="tweets.csv")

tweets$mentions <- str_extract_all(tweets$text, "@\\w+")

有些推文中提到了多个用户名，例如＆＃34; usernameA，usernameB和usernameC＆＃34;，但它们在一行中。现在我想用这些推文多行，这些推文提到了多个用户名和本推文中用户名的数量。同时，最后每行只显示一个用户名。让我说明我对已经使用过的例子的意思：

目前我有一排有两列（文字，提及）：

＆＃34;推文的文字＆＃34 ;; ＆＃34; usernameA，userNameB，usernameC＆＃34;

我想在这种情况下有三行：

＆＃34;推文的文字＆＃34 ;; ＆＃34; usernameA＆＃34;
＆＃34;推文的文字＆＃34 ;; ＆＃34; usernameB＆＃34;
＆＃34;推文的文字＆＃34 ;; ＆＃34; usernameC＆＃34;

我的问题是：

如何让r检查包含列表的条目（c（＆＃34; usernameA＆＃34;，＆＃34; usernameB＆＃34;，...）在指定列中？
如何告诉r多次此特定条目x-1次（x =提及次数）？
如何让r在每一行只留一个用户名？

Answer 1

您可以使用plyr解决问题，并通过文本列拆分推文的数据框：

plyr::ddply(tweets, c("text"), function(x){
    mention <- unlist(stringr::str_extract_all(x$text, "@\\w+"))
    # some tweets do not contain mentions, making this necessary:
    if (length(mention) > 0){
        return(data.frame(mention = mention))
    } else {
        return(data.frame(mention = NA))    
    }
})

实施例

tweets <- data.frame(text = c("A tweet with text and @user1 and @user2.",
                              "Another tweet @user3 and @user4 should hear about."))

运行上述函数返回：

                                                text mention
1           A tweet with text and @user1 and @user2.  @user1
2           A tweet with text and @user1 and @user2.  @user2
3 Another tweet @user3 and @user4 should hear about.  @user3
4 Another tweet @user3 and @user4 should hear about.  @user4

Answer 2

我使用不同的示例尝试了您的代码并且工作得很好，虽然我不知道如何面对的问题是当我从data.frame获得推文列表时我写了如下推文：

tweets<-data.frame(text=(table$variable))

而不是

tweets <- data.frame(text = c("A tweet with text and @user1 and @user2.",
                              "Another tweet @user3 and @user4 should hear about."))

显然格式不会改变，虽然在使用你的代码之后，我只是收到数字（实际上是＆＃39; @＆＃39;在文本内部）而不是获取句柄。

Answer 3

如果你添加stringsAsFactors=FALSE，Dave的回答会返回句柄而不是通用数据框中的数字：

plyr::ddply(mydata, c("text"), function(x){
  mention <- unlist(stringr::str_extract_all(x$text, "@\\w+"))
  # some tweets do not contain mentions, making this necessary:
  if (length(mention) > 0){
    return(data.frame(mention = mention,stringsAsFactors=FALSE))
  } else {
    return(data.frame(mention = NA))    
  }
})

在一条推文中为多个提及创建边（行）

3 个答案:

实施例