TypeError:使用Spark SQL udf时,DoubleType不是JSON可序列化的

时间:2017-10-12 07:20:50

标签: python apache-spark apache-spark-sql spark-dataframe

我的数据框中有三列,我想将其中两列合并为字典。这是我的代码:

TypeError: DoubleType is not JSON serializable

但我收到了一个错误:

[{'category': u'201',
  'news': [Row(_1=u'2369958', _2=0.43526815665082813),
   Row(_1=u'2417417', _2=0.5187076730540034),
   Row(_1=u'2561942', _2=0.5510633389155071),
   Row(_1=u'2321520', _2=0.5689017013790841)],
  'uuid': u'014368001839849'},
 {'category': u'447',
  'news': [Row(_1=u'2347850', _2=0.577052903708968),
   Row(_1=u'2535530', _2=0.7521969572004206),
   Row(_1=u'2539985', _2=0.6226747897366118),
   Row(_1=u'2365460', _2=0.7629144934549972),
   Row(_1=u'2349407', _2=0.6882745185469593),
   Row(_1=u'2453644', _2=0.7620190706668273),
   Row(_1=u'2467028', _2=0.462592224667512),
   Row(_1=u'2453394', _2=0.4146801427301294),
   Row(_1=u'2534209', _2=0.5128321439309336),
   Row(_1=u'2557643', _2=0.5425593620604838),
   Row(_1=u'2347847', _2=0.4081333702297975),
   Row(_1=u'2493692', _2=0.42726102421404183),
   Row(_1=u'2476064', _2=0.9851494825421524),
   Row(_1=u'2491799', _2=0.7521969572004206),
   Row(_1=u'2486584', _2=0.7521969572004206),
   Row(_1=u'2380484', _2=0.4016447505752669),
   Row(_1=u'2393509', _2=0.757253994310938)],
  'uuid': u'014368001840284'},
 {'category': u'207',
  'news': [Row(_1=u'2495947', _2=0.42279520434424095),
   Row(_1=u'2426154', _2=0.42279520434424095),
   Row(_1=u'2399794', _2=0.42279520434424095),
   Row(_1=u'2498287', _2=0.42279520434424095),
   Row(_1=u'2498619', _2=0.4882018500804606),
   Row(_1=u'2401019', _2=0.42279520434424095),
   Row(_1=u'2393635', _2=0.42279520434424095),
   Row(_1=u'2356242', _2=0.47940469980239864)],
  'uuid': u'014368002272404'},
 {'category': u'450',
  'news': [Row(_1=u'2557924', _2=0.4314939181009928),
   Row(_1=u'2557921', _2=0.8075963202440836),
   Row(_1=u'2455662', _2=0.4314939181009928),
   Row(_1=u'2320505', _2=0.43236218074760446),
   Row(_1=u'2337753', _2=0.6326301347394172),
   Row(_1=u'2399415', _2=0.4314939181009928)],
  'uuid': u'014368003987885'},
 {'category': u'453',
  'news': [Row(_1=u'2488166', _2=0.41558734245334084)],
  'uuid': u'014379000051658'}]

我该如何解决这个问题?我可以轻松地合并这两列吗?以下是我的数据示例:

Observable<CacheItem> getStream(final long id) {
    return RxNullable.fromCallable(() -> findCacheItem(id))
                     .onNullDrop()
                     .observable();
}

0 个答案:

没有答案