如何查找带有流行表情符号的推文

时间:2019-06-26 01:17:48

标签: r

我正在做一个使用R进行情感分析的项目。我正在尝试收集使用某些最受欢迎的表情符号的推文。如何通过表情符号收集推文?

#devtools::install_github("dill/emoGG")
library(emoGG)   # source of the "emoji_search" function
library(twitteR) # source of the "searchTwitter" and "twListToDF" functions

emoji_search("BALLOON")

emoji <- searchTwitter("BALLOON")
emoji
emojidf <- twListToDF(emoji)

1 个答案:

答案 0 :(得分:0)

经过一番谷歌搜索和实验,我了解了emojis are encoded in tweets in a confusing way(至少对我来说)。

一种快捷方式是使用Kate Lyons' here之类的表情符号字典来搜索表情符号。 Some more background about how she compiled it

这为我们提供了一种更直接的方式来搜索带有表情符号的推文。例如,字典显示如果我们寻找以下字符串,我们可以寻找“ balloon”表情符号:

<ed><a0><bc><ed><be><88>

我对rtweet较为熟悉,这是气球状表情符号搜索的外观:

[编辑:我不确定这是否正常工作。这些看起来都是非英语的推文,并且其中可能没有气球表情符号...] :-(

> rtweet::search_tweets("<ed><a0><bc><ed><be><88>")
# A tibble: 16 x 90
   user_id status_id created_at          screen_name text  source display_text_wi… reply_to_status… reply_to_user_id
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr>             <dbl> <chr>            <chr>           
 1 111373… 11429734… 2019-06-24 01:51:30 SPR1NGD4Y_  "? 엠… Twitt…              154 NA               NA              
 2 100224… 11428523… 2019-06-23 17:50:11 quark_kim   "탐라에… Twitt…              140 NA               NA              
 3 109648… 11428194… 2019-06-23 15:39:14 _4CC1D3N7_… "부장 … Twitt…              127 114281934863914… 109648624150199…
 4 113448… 11428090… 2019-06-23 14:58:01 MAX_commu   "자캐앤… Twitt…              140 NA               NA              
 5 819116… 11428062… 2019-06-23 14:46:46 jinimwoo    "자캐앤… Twitt…              140 NA               NA              
 6 103612… 11428013… 2019-06-23 14:27:27 00gY0       "자캐앤… Twitt…              140 NA               NA              
 7 107972… 11428003… 2019-06-23 14:23:32 YN_DGY      "탐라에… Twitt…              140 NA               NA              
 8 111199… 11427952… 2019-06-23 14:03:19 coffee_101… "탐라에… Twitt…              140 NA               NA              
 9 967054… 11427941… 2019-06-23 13:58:57 mphp0001    "탐라에… Twitt…              140 NA               NA              
10 928447… 11426751… 2019-06-23 06:06:06 yangE___    "탐라에… Twitt…              140 NA               NA              
11 836222… 11426745… 2019-06-23 06:03:32 sunseul_ma… "탐라에… Twitt…              140 NA               NA              
12 110802… 11426637… 2019-06-23 05:20:51 4th_month__ "탐라에… Twitt…              140 NA               NA              
13 113990… 11413476… 2019-06-19 14:10:47 Dream_Merr… "공지 … Twitt…               62 NA               NA              
14 777381… 11409418… 2019-06-18 11:18:24 mi_se2      "@Me… Twitt…              140 NA               NA              
15 330242… 11408761… 2019-06-18 06:57:35 lip_ran     "@Me… Twitt…              140 NA               NA              
16 113519… 11408687… 2019-06-18 06:27:56 barruwach   "@Me… Twitt…              140 NA               NA              
# … with 81 more variables: reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, favorite_count <int>,
#   retweet_count <int>, quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>, media_t.co <list>, media_expanded_url <list>,
#   media_type <list>, ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>, ext_media_type <chr>,
#   mentions_user_id <list>, mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>, quoted_retweet_count <int>,
#   quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>, quoted_location <chr>, quoted_description <chr>,
#   quoted_verified <lgl>, retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>, retweet_source <chr>,
#   retweet_favorite_count <int>, retweet_retweet_count <int>, retweet_user_id <chr>, retweet_screen_name <chr>,
#   retweet_name <chr>, retweet_followers_count <int>, retweet_friends_count <int>, retweet_statuses_count <int>,
#   retweet_location <chr>, retweet_description <chr>, retweet_verified <lgl>, place_url <chr>, place_name <chr>,
#   place_full_name <chr>, place_type <chr>, country <chr>, country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>, description <chr>, url <chr>, protected <lgl>,
#   followers_count <int>, friends_count <int>, listed_count <int>, statuses_count <int>, favourites_count <int>,
#   account_created_at <dttm>, verified <lgl>, profile_url <chr>, profile_expanded_url <chr>, account_lang <chr>,
#   profile_banner_url <chr>, profile_background_url <chr>, profile_image_url <chr>