我正在做一个使用R进行情感分析的项目。我正在尝试收集使用某些最受欢迎的表情符号的推文。如何通过表情符号收集推文?
#devtools::install_github("dill/emoGG")
library(emoGG) # source of the "emoji_search" function
library(twitteR) # source of the "searchTwitter" and "twListToDF" functions
emoji_search("BALLOON")
emoji <- searchTwitter("BALLOON")
emoji
emojidf <- twListToDF(emoji)
答案 0 :(得分:0)
经过一番谷歌搜索和实验,我了解了emojis are encoded in tweets in a confusing way(至少对我来说)。
一种快捷方式是使用Kate Lyons' here之类的表情符号字典来搜索表情符号。 Some more background about how she compiled it。
这为我们提供了一种更直接的方式来搜索带有表情符号的推文。例如,字典显示如果我们寻找以下字符串,我们可以寻找“ balloon”表情符号:
<ed><a0><bc><ed><be><88>
我对rtweet
较为熟悉,这是气球状表情符号搜索的外观:
[编辑:我不确定这是否正常工作。这些看起来都是非英语的推文,并且其中可能没有气球表情符号...] :-(
> rtweet::search_tweets("<ed><a0><bc><ed><be><88>")
# A tibble: 16 x 90
user_id status_id created_at screen_name text source display_text_wi… reply_to_status… reply_to_user_id
<chr> <chr> <dttm> <chr> <chr> <chr> <dbl> <chr> <chr>
1 111373… 11429734… 2019-06-24 01:51:30 SPR1NGD4Y_ "? 엠… Twitt… 154 NA NA
2 100224… 11428523… 2019-06-23 17:50:11 quark_kim "탐라에… Twitt… 140 NA NA
3 109648… 11428194… 2019-06-23 15:39:14 _4CC1D3N7_… "부장 … Twitt… 127 114281934863914… 109648624150199…
4 113448… 11428090… 2019-06-23 14:58:01 MAX_commu "자캐앤… Twitt… 140 NA NA
5 819116… 11428062… 2019-06-23 14:46:46 jinimwoo "자캐앤… Twitt… 140 NA NA
6 103612… 11428013… 2019-06-23 14:27:27 00gY0 "자캐앤… Twitt… 140 NA NA
7 107972… 11428003… 2019-06-23 14:23:32 YN_DGY "탐라에… Twitt… 140 NA NA
8 111199… 11427952… 2019-06-23 14:03:19 coffee_101… "탐라에… Twitt… 140 NA NA
9 967054… 11427941… 2019-06-23 13:58:57 mphp0001 "탐라에… Twitt… 140 NA NA
10 928447… 11426751… 2019-06-23 06:06:06 yangE___ "탐라에… Twitt… 140 NA NA
11 836222… 11426745… 2019-06-23 06:03:32 sunseul_ma… "탐라에… Twitt… 140 NA NA
12 110802… 11426637… 2019-06-23 05:20:51 4th_month__ "탐라에… Twitt… 140 NA NA
13 113990… 11413476… 2019-06-19 14:10:47 Dream_Merr… "공지 … Twitt… 62 NA NA
14 777381… 11409418… 2019-06-18 11:18:24 mi_se2 "@Me… Twitt… 140 NA NA
15 330242… 11408761… 2019-06-18 06:57:35 lip_ran "@Me… Twitt… 140 NA NA
16 113519… 11408687… 2019-06-18 06:27:56 barruwach "@Me… Twitt… 140 NA NA
# … with 81 more variables: reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, favorite_count <int>,
# retweet_count <int>, quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
# urls_t.co <list>, urls_expanded_url <list>, media_url <list>, media_t.co <list>, media_expanded_url <list>,
# media_type <list>, ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>, ext_media_type <chr>,
# mentions_user_id <list>, mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
# quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>, quoted_retweet_count <int>,
# quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>, quoted_followers_count <int>,
# quoted_friends_count <int>, quoted_statuses_count <int>, quoted_location <chr>, quoted_description <chr>,
# quoted_verified <lgl>, retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>, retweet_source <chr>,
# retweet_favorite_count <int>, retweet_retweet_count <int>, retweet_user_id <chr>, retweet_screen_name <chr>,
# retweet_name <chr>, retweet_followers_count <int>, retweet_friends_count <int>, retweet_statuses_count <int>,
# retweet_location <chr>, retweet_description <chr>, retweet_verified <lgl>, place_url <chr>, place_name <chr>,
# place_full_name <chr>, place_type <chr>, country <chr>, country_code <chr>, geo_coords <list>, coords_coords <list>,
# bbox_coords <list>, status_url <chr>, name <chr>, location <chr>, description <chr>, url <chr>, protected <lgl>,
# followers_count <int>, friends_count <int>, listed_count <int>, statuses_count <int>, favourites_count <int>,
# account_created_at <dttm>, verified <lgl>, profile_url <chr>, profile_expanded_url <chr>, account_lang <chr>,
# profile_banner_url <chr>, profile_background_url <chr>, profile_image_url <chr>