如何从使用此功能收集的转推中提取user_id?
## get only first 8 words from each tweet
x <- lapply(strsplit(dat$text, " "), "[", 1:8)
x <- lapply(x, na.omit)
x <- vapply(x, paste, collapse = " ", character(1))
## get rid of hyperlinks
x <- gsub("http[\\S]{1,}", "", x, perl = TRUE)
## encode for search query (handles the non ascii chars)
x <- sapply(x, URLencode, USE.NAMES = FALSE)
## get up to first 100 retweets for each tweet
data <- lapply(x, search_tweets, verbose = FALSE)
我有12个元素,每个元素都包含一个用户ID列表,如何只提取用户ID?
这是完整的代码:
library(rtweet)
library(dplyr)
library(plyr)
require(reshape2)
## search for day of rage tweets, try to exclude rt here
dor <- search_tweets("#Newsnight -filter:retweets", n = 10000)
## merge tweets data with unique (non duplicated) users data
## exclude retweets
## select status_id, retweet count, followers count, and text columns
dat <- dor %>%
users_data() %>%
unique() %>%
right_join(dor) %>%
filter(!is_retweet) %>%
dplyr::select(user_id, screen_name, retweet_count, followers_count, text) %>%
filter(retweet_count >=50 & retweet_count <100 & followers_count < 10000 & followers_count > 500)
dat
## get only first 8 words from each tweet
x <- lapply(strsplit(dat$text, " "), "[", 1:8)
x <- lapply(x, na.omit)
x <- vapply(x, paste, collapse = " ", character(1))
## get rid of hyperlinks
x <- gsub("http[\\S]{1,}", "", x, perl = TRUE)
## encode for search query (handles the non ascii chars)
x <- sapply(x, URLencode, USE.NAMES = FALSE)
## get up to first 100 retweets for each tweet
data <- lapply(x, search_tweets, verbose = FALSE)
答案 0 :(得分:0)
好的,所以你有一个包含12个数据帧的列表,每个数据帧都有一个名为user_id的列。如果列表已命名,那么这将起作用,如果未命名,则取出df_name = names(data)[x],
部分。
lapply(1:12, function(x) {
df <- data[[x]]
data.frame(user_id = df$user_id,
# df_name = names(data)[x],
df_number = x, stringsAsFactors=FALSE) } ) %>%
dplyr::bind_rows()
这应该会为您提供一个包含所有用户ID的新数据框以及它们来自哪个以前的数据框。