应用错误收集

我已经通过Apache Flume从Twitter提取了日志数据。这里获得的数据在像（FLUMEDATA.12334555678）这样的文件中。文件中的数据如下所示：

{“ type”：“ record”，“ name”：“ Doc”，“ doc”：“ adoc”，“ fields”：[{“ name”：“ id”，“ type”：“ string”} ，{“ name”：“ user_friends_count”，“ type”：[“ int”，“ null”]}}，{“ name”：“ user_location”，“ type”：[“ string”，“ null”]}，{ “ name”：“ user_description”，“ type”：[“ string”，“ null”]}，{“ name”：“ user_statuses_count”，“ type”：[“ int”，“ null”]}，{“名称“：”“ user_followers_count”，“ type”：[“ int”，“ null”]}，{“ name”：“ user_name”，“ type”：[“ string”，“ null”]}，{“ name”： “ user_screen_name”，“ type”：[“ string”，“ null”]}，{“ name”：“ created_at”，“ type”：[“ string”，“ null”]}，{“ name”：“ text “，” type“：[”字符串“，” null“]}，{” name“：” retweet_count“，” type“：[” long“，” null“]}，{” name“：” reweeted“， “ type”：[“ boolean”，“ null”]}，{“ name”：“ in_reply_to_user_id”，“ type”：[“ long”，“ null”]}，{“ name”：“ source”，“ type “：[” string“，” null“]}，{” name“：” in_reply_to_status_id“，” type“：[” long“，” null“]}，{” name“：” media_url_https“，” type“： [“ string”，“ null”]}，{“ name”：“ expanded_url”，“ type”：[“ string”，“ null”]}]}ˋˋrpex && 1069155373561475073''。$ MakeHouseDeepAgainbrad_k1（2018-12-02T14：34：39Zj @ _raeluv22 我在这种天气下Twitter iPhone��vhttps：//pbs.twimg.com/tweet_video_thumb/DtZnIhkU0AA7c9B.jpg | https://twitter.com/brad_k1/status/1069155373561475073/photo/1ˋ�rpex 我......

此数据存储为Avro对象。我正在尝试通过PySpark中的DataFrame读取和清理数据？还是我可以通过其他方式从处理这些数据中获得见解？

如何处理通过Flume从Twitter发送的非结构化日志数据？

0 个答案: