我已使用xml2json将xml文件转换为json。
其中一小部分如下所示。我想将其转换为csv。我正在使用csvkit in2csv
使用基本语法显示错误,简单。
C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json > test2.csv
When converting a JSON document with a top-level dictionary element, a key must
be specified.
因此添加密钥我没有错误,但也没有输出。
C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json -k "//Meeting/Races" > te
st2.csv
'//Meeting/Races'
C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json -k "//Meeting/Races/RaceE
ntries/RaceEntry" > test2.csv
'//Meeting/Races/RaceEntries/RaceEntry'
我现在尝试了各种各样的键并且没有错误但也没有输出,有关如何在csv中提供输出的任何想法吗?
{
"Meeting": {
"NumOfRaces": {
"#tail": "\n ",
"#text": "9"
},
"WeightsPublishing": {
"#tail": "\n ",
"#text": "2014-09-30T00:00:00+10:00"
},
"NominationsClose": {
"#tail": "\n ",
"#text": "2014-09-29T12:00:00+10:00"
},
"CodeType": {
"#tail": "\n ",
"#text": "GALLOPS"
},
"Track": {
"Rainfall": {
"#tail": "\n ",
"#text": "Nil last 24hrs, 4.2mm last 7 days"
},
"Irrigation": {
"#tail": "\n ",
"#text": "Nil last 24hrs, 25mm last 7 days"
},
"RailPosition": {
"#tail": "\n ",
"#text": "+9m Entire Circuit"
},
"#tail": "\n ",
"TrackSurface": {
"#tail": "\n ",
"#text": "Turf"
},
"Comments": {
"#tail": "\n ",
"#text": "Finalised 4\/10 - 7:45am Late Scratching Race 3 No. 4"
},
"Weather": {
"#tail": "\n ",
"#text": "Fine"
},
"Penetrometer": {
"#tail": "\n ",
"#text": "4.83"
},
"RailPositionLastMeeting": {
"#tail": "\n ",
"#text": "True Position Entire Circuit"
},
"TrackInfo": {
"#tail": "\n ",
"#text": "Penetrometer: Inside 4.85, Outside 4.85"
},
"TrackRating": {
"#tail": "\n ",
"#text": "Good"
},
"#text": "\n ",
"RacingDirection": {
"#tail": "\n ",
"#text": "AntiClockwise"
}
},
"MeetingStage": {
"#tail": "\n ",
"#text": "Acceptances"
},
"Races": {
"#tail": "\n",
"#text": "\n ",
"Race": [
{
"Comments": {
"#tail": "\n "
},
"NominationsDivisor": {
"#tail": "\n ",
"#text": "0"
},
"Starters": {
"#tail": "\n ",
"#text": "11"
},
"TrackRecords": {
"#tail": "\n ",
"TrackRecord": {
"TrackRecordHorse": {
"#tail": "\n "
},
"#text": "\n ",
"#tail": "\n ",
"DistanceRace": {
"#tail": "\n ",
"#text": "1000"
},
"Time": {
"#tail": "\n ",
"#text": "00:00:55.420"
},
"RaceNumber": {
"#tail": "\n ",
"#text": "7"
},
"RaceDate": {
"#tail": "\n ",
"#text": "2013-02-16"
}
},
"#text": "\n "
},
"RaceDistance": {
"#tail": "\n ",
"#text": "1000"
},
"NominationsRaceNumber": {
"#tail": "\n ",
"#text": "1"
},
"ApprenticeCanClaim": {
"#tail": "\n ",
"#text": "false"
},
"SizeField": {
"#tail": "\n ",
"#text": "16"
},
"NameRaceForm": {
"#tail": "\n ",
"#text": "MARIBYRNONG TRL"
},
"RaceType": {
"#tail": "\n ",
"#text": "Flat"
},
"SizeEmergency": {
"#tail": "\n ",
"#text": "4"
},
"DistanceApprox": {
"#tail": "\n ",
"#text": "false"
},
"#text": "\n ",
"BallotedOutEntries": {
"#tail": "\n "
},
"Logos": {
"#tail": "\n ",
"Logo": {
"#tail": "\n "
},
"#text": "\n "
},
"#tail": "\n ",
"TrackCircumference": {
"#tail": "\n ",
"#text": "2313"
},
"NameRaceNews": {
"#tail": "\n ",
"#text": "Maribyrnong Trial Stakes"
},
"WeightChange": {
"#tail": "\n ",
"#text": "0.00"
},
"Accepters": {
"#tail": "\n ",
"#text": "12"
},
"RaceEntries": {
"RaceEntry": [
{
"Trainer": {
"Location": {
"#tail": "\n ",
"#text": "Cranbourne"
},
"#text": "\n ",
"Surname": {
"#tail": "\n ",
"#text": "Laing"
答案 0 :(得分:3)
您正在做的事情有两个问题。
首先,您正在错误地指定密钥(在这种情况下,当您处理JSON时,您正在使用XML / XPath样式,使用斜杠)。您应该只提供元素的名称(例如会议)。
但是,主要问题是您使用的JSON类型,它由多个嵌套字典组成,in2csv无法真正处理(有几个级别,它如何知道要使用哪些列?)。您需要以某种方式展平您的数据,以便可以清楚地识别字段。
您可以查看this question有关如何将JSON转换为CSV的想法,因为我不认为in2csv会在您的情况下削减它。
答案 1 :(得分:1)
如果您要查找的是将每个XML路径转换为路径表达式, 将其用于CSV中的第1列, 并使用第2列最低级别的值, 以下代码可能会解决您的问题:
import json
json_input = """{
"Meeting": {
"NominationsClose": {
"#tail": "\\n ",
"#text": "2014-09-29T12:00:00+10:00"
},
"CodeType": {
"#tail": "\\n ",
"#text": "GALLOPS"
},
"Track": {
"Rainfall": {
"#tail": "\\n ",
"#text": "Nil last 24hrs, 4.2mm last 7 days"
},
"Irrigation": {
"#tail": "\\n ",
"#text": "Nil last 24hrs, 25mm last 7 days"
}
}
}
}"""
def print_csv_depth_first(tree, path=""):
if isinstance(tree, dict):
for key in tree.keys():
print_csv_depth_first(tree[key], "{}/{}".format(path, key))
elif isinstance(tree, list):
for i in range(len(tree)):
print_csv_depth_first(tree[i], "{}/{}".format(path, str(i)))
elif isinstance(tree, str):
entry = tree
print('{},{}'.format(path, repr(entry)))
return
json = json.loads(json_input)
print_csv_depth_first(json)
我已经包含了一小部分示例JSON数据。
在最底层,您的数据还包含列表的开头"RaceEntry": [
,
但这不完整,所以我不得不推断。
上面的代码产生以下输出:
/Meeting/NominationsClose/#tail,'\n '
/Meeting/NominationsClose/#text,'2014-09-29T12:00:00+10:00'
/Meeting/CodeType/#tail,'\n '
/Meeting/CodeType/#text,'GALLOPS'
/Meeting/Track/Rainfall/#tail,'\n '
/Meeting/Track/Rainfall/#text,'Nil last 24hrs, 4.2mm last 7 days'
/Meeting/Track/Irrigation/#tail,'\n '
/Meeting/Track/Irrigation/#text,'Nil last 24hrs, 25mm last 7 days'
您必须调整包含print
语句的行以满足您的需求。