无法将我的json变量转换为spark数据框

时间:2019-04-08 07:13:41

标签: json apache-spark pyspark youtube-api

我有json变量“ details”,我可以从api中获取它。
请按照以下步骤将其转换为数据框。
1.创建此json变量的rdd
2.然后将sqlContext.read.json()应用于此rdd。

但是在这种情况下,我得到了错误的数据框,例如anotherrdd:pyspark.sql.dataframe.DataFrame _corrupt_record:string

我的json变量就像:

{'content': {'caption': 'false',  
             'definition': 'hd',  
             'dimension': '2d',  
             'duration': 'PT12M10S',  
             'licensedContent': True,  
             'projection': 'rectangular'},  
 'descr': 'Samsung Galaxy S10+ unboxing and overview including camera sample '  
          'the Galaxy S10+ comes with a 6.4" SAMOLED screen, it\'s powered by '  
          'the Exynos 9820 SOC comes with 8GB RAM / 128 or 512 GB storage has '  
          'reverse wireless charging triple rear camera and a dual front '
          'facing camera.\n'  
          '\n'
          'Check out Mivi Bluetooth Speakers: /* https://mivi.shop/geekproducts */ \n'  
          'Discount Coupon code: GEEKYRANJIT\n'  
          '\n'
          'Samsung Galaxy S10 / S10+ are sold online in India via Amazon '  
          '/* https://amzn.to/2E8UOEa */',  
 'stats': {'commentCount': '1131',  
           'dislikeCount': '360',  
           'favoriteCount': '0',  
           'likeCount': '8027',  
           'viewCount': '388777'},   
 'tags': ['galaxy s10+',  
          'samsung galaxy s10+ unboxing',  
          'galaxy s10 plus unboxing',  
          'samsung galaxy s10+',  
          'samsung galaxy S10 plus',  
          'samsung S10+ india',  
          'samsung S10+',  
          'geekyranjit',  
          'samsung S10+ camera',  
          'samsung S10+ pictures'],  
 'title': 'Samsung Galaxy S10 + Unboxing & Overview (Indian Unit)'}    

我的代码:

rdd = sc.parallelize(details)  
anotherrdd = sqlContext.read.json(rdd)  
anotherrdd.show()  `  

我认为格式不正确,应该在json中使用,并且还会获取具有如下值的数据框:

+---------------+  
|_corrupt_record|  
+---------------+  
|          descr|  
|          title|  
|           tags|  
|          stats|  
|        content|  
+---------------+

我认为我从api获得的json格式不正确

0 个答案:

没有答案