我已将推文收集到JSON
格式的文件中。现在我尝试使用scala代码打印每个用户的推文数量。我找不到任何相关信息。在网上。我理解的是使用JSON
或gson
等来解析json4j
记录。部分代码是
val tweet = sqlContext.load("iphone6s.json", "json")
tweet.printSchema()
println(tweet.select("quoted_status.user.id_str").rdd)
val userMap = tweet.map(x => (extractUserId(x), 1))
val numTweetsByUser = userMap.reduceByKey((a, b) => a + b)
在上面的代码中,我使用regex
来提取用户ID。但我想使用所有字段的键值映射,以便我可以执行其他一些查询。
推文架构如下
root
|-- contributors: string (nullable = true)
|-- coordinates: struct (nullable = true)
| |-- coordinates: array (nullable = true)
| | |-- element: double (containsNull = true)
| |-- type: string (nullable = true)
|-- created_at: string (nullable = true)
|-- entities: struct (nullable = true)
| |-- hashtags: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- indices: array (nullable = true)
| | | | |-- element: long (containsNull = true)
| | | |-- text: string (nullable = true)
| |-- media: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- display_url: string (nullable = true)
| | | |-- expanded_url: string (nullable = true)
| | | |-- id: long (nullable = true)
| | | |-- id_str: string (nullable = true)
| | | |-- indices: array (nullable = true)
| | | | |-- element: long (containsNull = true)
| | | |-- media_url: string (nullable = true)
| | | |-- media_url_https: string (nullable = true)
| | | |-- sizes: struct (nullable = true)
| | | | |-- large: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | | |-- medium: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | | |-- small: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | | |-- thumb: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | |-- source_status_id: long (nullable = true)
| | | |-- source_status_id_str: string (nullable = true)
| | | |-- source_user_id: long (nullable = true)
| | | |-- source_user_id_str: string (nullable = true)
| | | |-- type: string (nullable = true)
| | | |-- url: string (nullable = true)
| |-- symbols: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- indices: array (nullable = true)
| | | | |-- element: long (containsNull = true)
| | | |-- text: string (nullable = true)
| |-- urls: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- display_url: string (nullable = true)
| | | |-- expanded_url: string (nullable = true)
| | | |-- indices: array (nullable = true)
| | | | |-- element: long (containsNull = true)
| | | |-- url: string (nullable = true)
| |-- user_mentions: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- id: long (nullable = true)
| | | |-- id_str: string (nullable = true)
| | | |-- indices: array (nullable = true)
| | | | |-- element: long (containsNull = true)
| | | |-- name: string (nullable = true)
| | | |-- screen_name: string (nullable = true)
|-- extended_entities: struct (nullable = true)
| |-- media: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- display_url: string (nullable = true)
| | | |-- expanded_url: string (nullable = true)
| | | |-- id: long (nullable = true)
| | | |-- id_str: string (nullable = true)
| | | |-- indices: array (nullable = true)
| | | | |-- element: long (containsNull = true)
| | | |-- media_url: string (nullable = true)
| | | |-- media_url_https: string (nullable = true)
| | | |-- sizes: struct (nullable = true)
| | | | |-- large: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | | |-- medium: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | | |-- small: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | | |-- thumb: struct (nullable = true)
| | | | | |-- h: long (nullable = true)
| | | | | |-- resize: string (nullable = true)
| | | | | |-- w: long (nullable = true)
| | | |-- source_status_id: long (nullable = true)
| | | |-- source_status_id_str: string (nullable = true)
| | | |-- source_user_id: long (nullable = true)
| | | |-- source_user_id_str: string (nullable = true)
| | | |-- type: string (nullable = true)
| | | |-- url: string (nullable = true)
| | | |-- video_info: struct (nullable = true)
| | | | |-- aspect_ratio: array (nullable = true)
| | | | | |-- element: long (containsNull = true)
| | | | |-- duration_millis: long (nullable = true)
| | | | |-- variants: array (nullable = true)
| | | | | |-- element: struct (containsNull = true)
| | | | | | |-- bitrate: long (nullable = true)
| | | | | | |-- content_type: string (nullable = true)
| | | | | | |-- url: string (nullable = true)
|-- favorite_count: long (nullable = true)
|-- favorited: boolean (nullable = true)
|-- filter_level: string (nullable = true)
|-- geo: struct (nullable = true)
| |-- coordinates: array (nullable = true)
| | |-- element: double (containsNull = true)
| |-- type: string (nullable = true)
|-- id: long (nullable = true)
|-- id_str: string (nullable = true)
|-- in_reply_to_screen_name: string (nullable = true)
|-- in_reply_to_status_id: long (nullable = true)
|-- in_reply_to_status_id_str: string (nullable = true)
|-- in_reply_to_user_id: long (nullable = true)
|-- in_reply_to_user_id_str: string (nullable = true)
|-- is_quote_status: boolean (nullable = true)
|-- lang: string (nullable = true)
|-- limit: struct (nullable = true)
| |-- timestamp_ms: string (nullable = true)
| |-- track: long (nullable = true)
|-- place: struct (nullable = true)
| |-- bounding_box: struct (nullable = true)
| | |-- coordinates: array (nullable = true)
| | | |-- element: array (containsNull = true)
| | | | |-- element: array (containsNull = true)
| | | | | |-- element: double (containsNull = true)
| | |-- type: string (nullable = true)
| |-- country: string (nullable = true)
| |-- country_code: string (nullable = true)
| |-- full_name: string (nullable = true)
| |-- id: string (nullable = true)
| |-- name: string (nullable = true)
| |-- place_type: string (nullable = true)
| |-- url: string (nullable = true)
|-- possibly_sensitive: boolean (nullable = true)
|-- quoted_status: struct (nullable = true)
| |-- contributors: string (nullable = true)
| |-- coordinates: struct (nullable = true)
| | |-- coordinates: array (nullable = true)
| | | |-- element: double (containsNull = true)
| | |-- type: string (nullable = true)
| |-- created_at: string (nullable = true)
| |-- entities: struct (nullable = true)
| | |-- hashtags: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- indices: array (nullable = true)
| | | | | |-- element: long (containsNull = true)
| | | | |-- text: string (nullable = true)
| | |-- media: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- display_url: string (nullable = true)
| | | | |-- expanded_url: string (nullable = true)
| | | | |-- id: long (nullable = true)
| | | | |-- id_str: string (nullable = true)
| | | | |-- indices: array (nullable = true)
| | | | | |-- element: long (containsNull = true)
| | | | |-- media_url: string (nullable = true)
| | | | |-- media_url_https: string (nullable = true)
| | | | |-- sizes: struct (nullable = true)
| | | | | |-- large: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | | |-- medium: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | | |-- small: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | | |-- thumb: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | |-- source_status_id: long (nullable = true)
| | | | |-- source_status_id_str: string (nullable = true)
| | | | |-- source_user_id: long (nullable = true)
| | | | |-- source_user_id_str: string (nullable = true)
| | | | |-- type: string (nullable = true)
| | | | |-- url: string (nullable = true)
| | |-- symbols: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- urls: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- display_url: string (nullable = true)
| | | | |-- expanded_url: string (nullable = true)
| | | | |-- indices: array (nullable = true)
| | | | | |-- element: long (containsNull = true)
| | | | |-- url: string (nullable = true)
| | |-- user_mentions: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- id: long (nullable = true)
| | | | |-- id_str: string (nullable = true)
| | | | |-- indices: array (nullable = true)
| | | | | |-- element: long (containsNull = true)
| | | | |-- name: string (nullable = true)
| | | | |-- screen_name: string (nullable = true)
| |-- extended_entities: struct (nullable = true)
| | |-- media: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- display_url: string (nullable = true)
| | | | |-- expanded_url: string (nullable = true)
| | | | |-- id: long (nullable = true)
| | | | |-- id_str: string (nullable = true)
| | | | |-- indices: array (nullable = true)
| | | | | |-- element: long (containsNull = true)
| | | | |-- media_url: string (nullable = true)
| | | | |-- media_url_https: string (nullable = true)
| | | | |-- sizes: struct (nullable = true)
| | | | | |-- large: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | | |-- medium: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | | |-- small: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | | |-- thumb: struct (nullable = true)
| | | | | | |-- h: long (nullable = true)
| | | | | | |-- resize: string (nullable = true)
| | | | | | |-- w: long (nullable = true)
| | | | |-- source_status_id: long (nullable = true)
| | | | |-- source_status_id_str: string (nullable = true)
| | | | |-- source_user_id: long (nullable = true)
| | | | |-- source_user_id_str: string (nullable = true)
| | | | |-- type: string (nullable = true)
| | | | |-- url: string (nullable = true)
| | | | |-- video_info: struct (nullable = true)
| | | | | |-- aspect_ratio: array (nullable = true)
| | | | | | |-- element: long (containsNull = true)
| | | | | |-- duration_millis: long (nullable = true)
| | | | | |-- variants: array (nullable = true)
| | | | | | |-- element: struct (containsNull = true)
| | | | | | | |-- bitrate: long (nullable = true)
| | | | | | | |-- content_type: string (nullable = true)
| | | | | | | |-- url: string (nullable = true)
非常感谢任何建议。 感谢