我有一个使用pyspark加载的json数据库。
我正在尝试访问其中每个结构的所有“ x”组件。
这是df.select("level_instance_json.player").printSchema()
root
|-- player: struct (nullable = true)
| |-- 0: struct (nullable = true)
| | |-- head_pitch: long (nullable = true)
| | |-- head_roll: long (nullable = true)
| | |-- head_yaw: long (nullable = true)
| | |-- r: long (nullable = true)
| | |-- x: long (nullable = true)
| | |-- y: long (nullable = true)
| |-- 1: struct (nullable = true)
| | |-- head_pitch: long (nullable = true)
| | |-- head_roll: long (nullable = true)
| | |-- head_yaw: long (nullable = true)
| | |-- r: long (nullable = true)
| | |-- x: long (nullable = true)
| | |-- y: long (nullable = true)
...
我尝试使用“ *”选择器选择全部,但它不起作用。
df.select("level_instance_json.player.*.x").show(10)
出现此错误:
'No such struct field * in 0, 1, 10, 100, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 101, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 102,...
答案 0 :(得分:0)
您可以这样做:
list_player_numbers = [el.name for el in df.select("level_instance_json.player").schema['player'].dataType]
list_fields = ['.'.join(['level_instance_json', 'player', player_number, 'x']) for player_number in list_player_numbers]
output = df.select(list_fields)
应该可以。
泽维尔