我有一个要解析的json文件.json格式是这样的:
{"cv_id":"001","cv_parse": { "educations": [{"major": "English", "degree": "Bachelor" },{"major": "English", "degree": "Master "}],"basic_info": { "birthyear": "1984", "location": {"state": "New York"}}}}
我必须得到文件中的每一个字。如何从数组中获取"major"
并且我必须得到"省"使用方法df.select("cv_parse.basic_info.location.province")
?
这是我想要的结果:
cv_id major degree birthyear state
001 English Bachelor 1984 New York
001 English Master 1984 New York
答案 0 :(得分:0)
这可能不是最好的方法,但你可以试一试。
from turtle import *
t = Turtle()
screen = t.getscreen()
rows = screen.numinput('Number of rows',
'How many rows shall there be?', 5, 1, 10)
columns = screen.numinput('Number of columns',
'How many columns shall there be?', 5, 1, 10)
side_length = screen.numinput('Length of square side',
'How long shall the square sides be?', 30, 10, 50)
first_color = screen.textinput('First color',
'What shall the first color be?')
second_color = screen.textinput('Second color',
'What shall the second color be?')
third_color = screen.textinput('Third color',
'What shall the third color be?')
square_color = ''
def draw_square():
t.begin_fill()
t.pendown()
t.forward(side_length)
t.left(90)
t.forward(side_length)
t.left(90)
t.forward(side_length)
t.left(90)
t.forward(side_length)
t.color(square_color)
t.end_fill()
t.penup()
t.color('black')
t.left(90)
t.forward(side_length)
def draw_board():
n = 1
for i in range(int(columns)):
draw_square()
for x in range(int(rows - 1)):
t.goto(0,side_length * n)
for i in range(int(columns)):
draw_square()
n += 1
for i in range(int(columns)):
for x in range(int(rows)):
if x + i % 3 == 0:
square_color = first_color
elif x + i % 3 == 1:
square_color = second_color
elif x + i % 3 == 2:
square_color = third_color
draw_board()
done()
您的架构将是:
// import the implicits functions
import org.apache.spark.sql.functions._
import sqlContext.implicits._
//read the json file
val jsonDf = sqlContext.read.json("sample-data/sample.json")
jsonDf.printSchema
现在您需要爆炸root
|-- cv_id: string (nullable = true)
|-- cv_parse: struct (nullable = true)
| |-- basic_info: struct (nullable = true)
| | |-- birthyear: string (nullable = true)
| | |-- location: struct (nullable = true)
| | | |-- state: string (nullable = true)
| |-- educations: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- degree: string (nullable = true)
| | | |-- major: string (nullable = true)
列
educations
现在你的架构将是
val explodedResult = jsonDf.select($"cv_id", explode($"cv_parse.educations"),
$"cv_parse.basic_info.birthyear", $"cv_parse.basic_info.location.state")
explodedResult.printSchema
现在您可以选择列
root
|-- cv_id: string (nullable = true)
|-- col: struct (nullable = true)
| |-- degree: string (nullable = true)
| |-- major: string (nullable = true)
|-- birthyear: string (nullable = true)
|-- state: string (nullable = true)