Question

我有json变量“ details”，我可以从api中获取它。
请按照以下步骤将其转换为数据框。
1.创建此json变量的rdd
2.然后将sqlContext.read.json（）应用于此rdd。

但是在这种情况下，我得到了错误的数据框，例如anotherrdd：pyspark.sql.dataframe.DataFrame _corrupt_record：string

我的json变量就像：

{'content': {'caption': 'false',  
             'definition': 'hd',  
             'dimension': '2d',  
             'duration': 'PT12M10S',  
             'licensedContent': True,  
             'projection': 'rectangular'},  
 'descr': 'Samsung Galaxy S10+ unboxing and overview including camera sample '  
          'the Galaxy S10+ comes with a 6.4" SAMOLED screen, it\'s powered by '  
          'the Exynos 9820 SOC comes with 8GB RAM / 128 or 512 GB storage has '  
          'reverse wireless charging triple rear camera and a dual front '
          'facing camera.\n'  
          '\n'
          'Check out Mivi Bluetooth Speakers: /* https://mivi.shop/geekproducts */ \n'  
          'Discount Coupon code: GEEKYRANJIT\n'  
          '\n'
          'Samsung Galaxy S10 / S10+ are sold online in India via Amazon '  
          '/* https://amzn.to/2E8UOEa */',  
 'stats': {'commentCount': '1131',  
           'dislikeCount': '360',  
           'favoriteCount': '0',  
           'likeCount': '8027',  
           'viewCount': '388777'},   
 'tags': ['galaxy s10+',  
          'samsung galaxy s10+ unboxing',  
          'galaxy s10 plus unboxing',  
          'samsung galaxy s10+',  
          'samsung galaxy S10 plus',  
          'samsung S10+ india',  
          'samsung S10+',  
          'geekyranjit',  
          'samsung S10+ camera',  
          'samsung S10+ pictures'],  
 'title': 'Samsung Galaxy S10 + Unboxing & Overview (Indian Unit)'}

我的代码：

rdd = sc.parallelize(details)  
anotherrdd = sqlContext.read.json(rdd)  
anotherrdd.show()  `

我认为格式不正确，应该在json中使用，并且还会获取具有如下值的数据框：

+---------------+  
|_corrupt_record|  
+---------------+  
|          descr|  
|          title|  
|           tags|  
|          stats|  
|        content|  
+---------------+

我认为我从api获得的json格式不正确

无法将我的json变量转换为spark数据框

0 个答案: