如何为每个元素提取用户ID

时间:2017-07-26 21:22:18

标签: r twitter

如何从使用此功能收集的转推中提取user_id?

## get only first 8 words from each tweet
x <- lapply(strsplit(dat$text, " "), "[", 1:8)
x <- lapply(x, na.omit)
x <- vapply(x, paste, collapse = " ", character(1))
## get rid of hyperlinks
x <- gsub("http[\\S]{1,}", "", x, perl = TRUE)
## encode for search query (handles the non ascii chars)
x <- sapply(x, URLencode, USE.NAMES = FALSE)
## get up to first 100 retweets for each tweet
data <- lapply(x, search_tweets, verbose = FALSE)

我有12个元素,每个元素都包含一个用户ID列表,如何只提取用户ID?

这是完整的代码:

library(rtweet)
library(dplyr)
library(plyr)
require(reshape2)

## search for day of rage tweets, try to exclude rt here
dor <- search_tweets("#Newsnight -filter:retweets", n = 10000)

## merge tweets data with unique (non duplicated) users data
## exclude retweets
## select status_id, retweet count, followers count, and text columns
dat <- dor %>%
  users_data() %>%
  unique() %>%
  right_join(dor) %>%
  filter(!is_retweet) %>%
  dplyr::select(user_id, screen_name, retweet_count, followers_count, text) %>%
  filter(retweet_count >=50 & retweet_count <100 & followers_count < 10000 & followers_count > 500)
dat

## get only first 8 words from each tweet
x <- lapply(strsplit(dat$text, " "), "[", 1:8)
x <- lapply(x, na.omit)
x <- vapply(x, paste, collapse = " ", character(1))
## get rid of hyperlinks
x <- gsub("http[\\S]{1,}", "", x, perl = TRUE)
## encode for search query (handles the non ascii chars)
x <- sapply(x, URLencode, USE.NAMES = FALSE)
## get up to first 100 retweets for each tweet
data <- lapply(x, search_tweets, verbose = FALSE)

There are 11 more elements like this

12 elements

1 个答案:

答案 0 :(得分:0)

好的,所以你有一个包含12个数据帧的列表,每个数据帧都有一个名为user_id的列。如果列表已命名,那么这将起作用,如果未命名,则取出df_name = names(data)[x],部分。

lapply(1:12, function(x) {
  df <- data[[x]]
  data.frame(user_id = df$user_id, 
             # df_name = names(data)[x], 
             df_number = x, stringsAsFactors=FALSE) } ) %>%
dplyr::bind_rows()

这应该会为您提供一个包含所有用户ID的新数据框以及它们来自哪个以前的数据框。