应用错误收集

我们正在使用Apache flume从twitter引擎中提取twitter数据。在这里，我包括了配置文件，其中包含源，通道和接收器的必需属性。我将源用作TWITTER源，将通道用作MEMORY通道，将接收器用作HDFS接收器。此外，我还完成了通过flume从twitter提取数据的操作，但是这里获得的数据是以BIN文件的形式出现的，例如（FLUMEDATA.12334555678）我无法直接打开文件。而且文件的内容是非常不规则的，我无法理解...。如下所示。

请帮助我有关如何处理该特定文件以及如何查看HDFS中的tweet数据？任何答案都可以帮助我.....

文件中的数据如下所示：

hdfs dfs -cat /twitterdata/FlumeData.1543741485655

{“ type”：“ record”，“ name”：“ Doc”，“ doc”：“ adoc”，“ fields”：[{“ name”：“ id”，“ type”：“ string”} ，{“ name”：“ user_friends_count”，“ type”：[“ int”，“ null”]}}，{“ name”：“ user_location”，“ type”：[“ string”，“ null”]}，{ “ name”：“ user_description”，“ type”：[“ string”，“ null”]}，{“ name”：“ user_statuses_count”，“ type”：[“ int”，“ null”]}，{“名称“：”“ user_followers_count”，“ type”：[“ int”，“ null”]}，{“ name”：“ user_name”，“ type”：[“ string”，“ null”]}，{“ name”： “ user_screen_name”，“ type”：[“ string”，“ null”]}，{“ name”：“ created_at”，“ type”：[“ string”，“ null”]}，{“ name”：“ text “，” type“：[”字符串“，” null“]}，{” name“：” retweet_count“，” type“：[” long“，” null“]}，{” name“：” reweeted“， “ type”：[“ boolean”，“ null”]}，{“ name”：“ in_reply_to_user_id”，“ type”：[“ long”，“ null”]}，{“ name”：“ source”，“ type “：[” string“，” null“]}，{” name“：” in_reply_to_status_id“，” type“：[” long“，” null“]}，{” name“：” media_url_https“，” type“： [“ string”，“ null”]}，{“ name”：“ expanded_url”，“ type”：[“ string”，“ null”]}]}ˋˋrpex && 1069155373561475073''。$ MakeHouseDeepAgainbrad_k1（2018-12-02T14：34：39Zj @ _raeluv22 Me in this weather Twitter for iPhone https://twitter.com/brad_k1/status/1069155373561475073/photo/1ˋ�rpex 我......

通过Apache Flume从Twitter提取数据后，如何处理Twitter数据？

0 个答案: