我有json变量“ details”,我可以从api中获取它。
请按照以下步骤将其转换为数据框。
1.创建此json变量的rdd
2.然后将sqlContext.read.json()应用于此rdd。
但是在这种情况下,我得到了错误的数据框,例如anotherrdd:pyspark.sql.dataframe.DataFrame _corrupt_record:string
我的json变量就像:
{'content': {'caption': 'false',
'definition': 'hd',
'dimension': '2d',
'duration': 'PT12M10S',
'licensedContent': True,
'projection': 'rectangular'},
'descr': 'Samsung Galaxy S10+ unboxing and overview including camera sample '
'the Galaxy S10+ comes with a 6.4" SAMOLED screen, it\'s powered by '
'the Exynos 9820 SOC comes with 8GB RAM / 128 or 512 GB storage has '
'reverse wireless charging triple rear camera and a dual front '
'facing camera.\n'
'\n'
'Check out Mivi Bluetooth Speakers: /* https://mivi.shop/geekproducts */ \n'
'Discount Coupon code: GEEKYRANJIT\n'
'\n'
'Samsung Galaxy S10 / S10+ are sold online in India via Amazon '
'/* https://amzn.to/2E8UOEa */',
'stats': {'commentCount': '1131',
'dislikeCount': '360',
'favoriteCount': '0',
'likeCount': '8027',
'viewCount': '388777'},
'tags': ['galaxy s10+',
'samsung galaxy s10+ unboxing',
'galaxy s10 plus unboxing',
'samsung galaxy s10+',
'samsung galaxy S10 plus',
'samsung S10+ india',
'samsung S10+',
'geekyranjit',
'samsung S10+ camera',
'samsung S10+ pictures'],
'title': 'Samsung Galaxy S10 + Unboxing & Overview (Indian Unit)'}
我的代码:
rdd = sc.parallelize(details)
anotherrdd = sqlContext.read.json(rdd)
anotherrdd.show() `
我认为格式不正确,应该在json中使用,并且还会获取具有如下值的数据框:
+---------------+
|_corrupt_record|
+---------------+
| descr|
| title|
| tags|
| stats|
| content|
+---------------+
我认为我从api获得的json格式不正确