当值等于spark时,组键

时间:2016-09-14 13:23:13

标签: apache-spark pyspark

我正在尝试根据spark / python中的首选项对人进行分组。在从原始数据进行多次转换后,我的最终RDD是:

[h1,h3]->[1,4]

现在我转换的RDD应该像

   [((u'90249', u'79727', u'49495'), [u'Collecting Sports Cards (Baseball', u' Basketball', u' Football', u' Hockey)']), 
    ((u'79727', u'12512', u'71917'), [u'Collecting Sports Cards (Baseball', u' Basketball', u' Football', u' Hockey)']), 
    ((u'12512', u'27195', u'49495'), [u'Collecting Sports Cards (Baseball', u' Basketball', u' Football', u' Hockey)']), 
    ((u'90249', u'76176', u'49495'), [u'Collecting Sports Cards (Baseball', u' Basketball', u' Football', u' Hockey)']), 
    ((u'79727', u'27195', u'76176'), [u'Collecting Sports Cards (Baseball', u' Basketball', u' Football', u' Hockey)'])]

我说那个人h1和h3有相同的兴趣。

我怎样才能做到这一点?

我的中级数据如下:

export function createHTTP(url:string, headers?:Headers){
  let injector = ReflectiveInjector.resolveAndCreate([
    myHttp,
    {provide:'defaultUrl', useValue:url},
    {provide:'defaultHeaders', useValue:headers || new Headers()},
    ...HTTP_Providers
  ])
  return injector.get(myHttp)
}

以上是中间转换中产生的RDD。我很惊讶地看到了结果,现在我正面临着梳理最终RDD的问题。

  

TypeError:不可用类型:'list'

0 个答案:

没有答案