我正在推送来自Twitter的数据,其格式如下:
Map(UserLang -> hi,
UserName -> CarterWyatt,
UserScreenName -> CarterWyatt1,
HashTags -> ,
UserVerification -> false,
Spam -> true,
UserFollowersCount -> 121,
UserLocation -> null,
UserStatusCount -> 146405,
UserCreated -> 2013-03-04T16:44:27.000+0530,
UserDescription -> null,
TextLength -> 113,
Text -> abcd.,
UserFollowersRatio -> 121.0,
UserFavouritesCount -> 0,
UserFriendsCount -> 1,
StatusCreatedAt -> 2016-07-14T20:52:52.000+0530,
UserID -> 1241101146)
我想使用如下的案例类:
case class Foo(UserLang :String, UserName :String, UserScreenName :String, HashTags :String,
UserVerification :String, Spam :String, UserFollowersCount :String,
UserLocation :String, UserStatusCount :String, UserCreated :String, UserDescription :String,
TextLength :String, Text :String, UserFollowersRatio :String, UserFavouritesCount :String,
UserFriendsCount :String, StatusCreatedAt :String, UserID: String)
现在我想将case类用作spark-sql表列名,并希望从map(values)中获取值,简而言之,希望从流值中填充表中的数据。
我不确定如何准确地做到这一点,请提供相同的指示。