我一直在使用Twitter REST API中的searchTwitter函数来检索一定数量的推文,并将其转储到TXT文件中。 该TXT文件的结构是:
//bad example (IMO)
Stream<String> allFoos = list.stream()
.map(o -> o != null && o.nested != null && o.nested.inner != null ?
o.nested.inner.foo : "Missing");
我希望将其作为JSON结构,即
"text" "favorited" "favoriteCount" "replyToSN" "created" "truncated" "replyToSID" "id" "replyToUID" "statusSource" "screenName" "retweetCount" "isRetweet" "retweeted" "longitude" "latitude"
"1" "RT @kobebryant: Last night was the final chapter to an incredible story. I walk away at peace knowing my love for the game & this city will…" FALSE 0 NA 2016-04-14 23:59:59 FALSE NA "720763566027096066" NA "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>" "JtLONGWAY" 204125 TRUE FALSE NA NA
"2" "RT @kobebryant: Last night was the final chapter to an incredible story. I walk away at peace knowing my love for the game & this city will…" FALSE 0 NA 2016-04-14 23:59:59 FALSE NA "720763566014332928" NA "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>" "Mr_Wizrd" 204125 TRUE FALSE NA NA
"3" "RT @MagicJohnson: I got a chance to get to know @kobebryant away from the court at the @Dodgers game! #ThankYouKobe #KB20 https://twitter.com/sVsW…" FALSE 0 NA 2016-04-14 23:59:59 FALSE NA "720763563783110661" NA "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>" "TynashKobe" 777 TRUE FALSE NA NA
我一直在尝试使用{"created_at":"Wed Apr 13 22:06:02 +0000 2016","id":720372500065071104,"id_str":"720372500065071104","text":"RT @STAPLESCenter: This is where @kobebryant will hold is final press conference tonight. #ThankYouKobe https:\/\/t.co\/1rTiq5eAS9","source":"\u003ca href=\"http:\/\/tweetlogix.com\" rel=\"nofollow\"\u003eTweetlogix\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":149681225,"id_str":"149681225","name":"SP","screen_name":"Mr_LayedBak","location":"West side of Detroit","url":null,"description":"Unfollow me if you're easily offended","protected":false,"verified":false,"followers_count":4326,"friends_count":597,"listed_count":105,"favourites_count":371,"statuses_count":227845,"created_at":"Sat May 29 23:21:29 +0000 2010","utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"131516","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/736248613\/7d89d45f16e6c4e508a883aded1aac64.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/736248613\/7d89d45f16e6c4e508a883aded1aac64.jpeg","profile_background_tile":true,"profile_link_color":"141313","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"660A0A","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/719706881736974341\/XT8R51s8_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/719706881736974341\/XT8R51s8_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/149681225\/1452265608","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Apr 13 21:37:36 +0000 2016","id":720365343500144640,"id_str":"720365343500144640","text":"This is where @kobebryant will hold is final press conference tonight. #ThankYouKobe https:\/\/t.co\/1rTiq5eAS9","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":28725783,"id_str":"28725783","name":"STAPLES Center","screen_name":"STAPLESCenter","location":"Los Angeles","url":"http:\/\/www.staplescenter.com","description":"Sports and Entertainment Center of the World located in downtown Los Angeles @LALIVE since 1999. Instagram: @staplescenterla","protected":false,"verified":true,"followers_count":82891,"friends_count":10907,"listed_count":862,"favourites_count":1905,"statuses_count":11024,"created_at":"Sat Apr 04 03:04:17 +0000 2009","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"131516","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/553367185700036609\/q6Kh8Ru8.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/553367185700036609\/q6Kh8Ru8.jpeg","profile_background_tile":true,"profile_link_color":"009999","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"EFEFEF","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/2394735481\/7rom2fzqu1vwrq94yzll_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/2394735481\/7rom2fzqu1vwrq94yzll_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/28725783\/1416251684","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":264,"favorite_count":439,"entities":{"hashtags":[{"text":"ThankYouKobe","indices":[71,84]}],"urls":[],"user_mentions":[{"screen_name":"kobebryant","name":"Kobe Bryant","id":1059194370,"id_str":"1059194370","indices":[14,25]}],"symbols":[],"media":[{"id":720365333593260032,"id_str":"720365333593260032","indices":[85,108],"media_url":"http:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","url":"https:\/\/t.co\/1rTiq5eAS9","display_url":"pic.twitter.com\/1rTiq5eAS9","expanded_url":"http:\/\/twitter.com\/STAPLESCenter\/status\/720365343500144640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":425,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":1024,"h":1280,"resize":"fit"},"medium":{"w":600,"h":750,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":720365333593260032,"id_str":"720365333593260032","indices":[85,108],"media_url":"http:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","url":"https:\/\/t.co\/1rTiq5eAS9","display_url":"pic.twitter.com\/1rTiq5eAS9","expanded_url":"http:\/\/twitter.com\/STAPLESCenter\/status\/720365343500144640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":425,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":1024,"h":1280,"resize":"fit"},"medium":{"w":600,"h":750,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"ThankYouKobe","indices":[90,103]}],"urls":[],"user_mentions":[{"screen_name":"STAPLESCenter","name":"STAPLES Center","id":28725783,"id_str":"28725783","indices":[3,17]},{"screen_name":"kobebryant","name":"Kobe Bryant","id":1059194370,"id_str":"1059194370","indices":[33,44]}],"symbols":[],"media":[{"id":720365333593260032,"id_str":"720365333593260032","indices":[104,127],"media_url":"http:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","url":"https:\/\/t.co\/1rTiq5eAS9","display_url":"pic.twitter.com\/1rTiq5eAS9","expanded_url":"http:\/\/twitter.com\/STAPLESCenter\/status\/720365343500144640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":425,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":1024,"h":1280,"resize":"fit"},"medium":{"w":600,"h":750,"resize":"fit"}},"source_status_id":720365343500144640,"source_status_id_str":"720365343500144640","source_user_id":28725783,"source_user_id_str":"28725783"}]},"extended_entities":{"media":[{"id":720365333593260032,"id_str":"720365333593260032","indices":[104,127],"media_url":"http:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/Cf9AdElVAAA7BqM.jpg","url":"https:\/\/twitter.com\/1rTiq5eAS9","display_url":"pic.twitter.com\/1rTiq5eAS9","expanded_url":"http:\/\/twitter.com\/STAPLESCenter\/status\/720365343500144640\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":425,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":1024,"h":1280,"resize":"fit"},"medium":{"w":600,"h":750,"resize":"fit"}},"source_status_id":720365343500144640,"source_status_id_str":"720365343500144640","source_user_id":28725783,"source_user_id_str":"28725783"}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1460585162546"}
加载TXT文件,我发现的第一个问题是,由于TXT是以空格作为标题的分隔符而形成的,因此我收到错误说行中的列数多于标题中的列数(当然我正在尝试处理推文中的文本)。
如果我没有指定分隔符(即read.csv(file, header = TRUE, sep ="")
)并且我将内容转储到数据框中,那么我只获得1列。
任何提示?
答案 0 :(得分:0)
您可以执行类似
的操作txt <- readLines("myfile.txt")
df <- read.table(text=sub("\\d+-\\d+-\\d+ \\d+:\\d+:\\d+", '"\\1"', txt), header=T)
library(jsonlite)
toJSON(df)
# [{"text":"RT @kobebryant: Last night was the final chapter to an incredible story. I walk ...
出现问题,因为日期时间列created
未包含在引号中。因此,日期和时间分开 - 并且标题字段的数量不再匹配。 (如果text
列中存在类似的模式,这种简单方法可能会中断。)