Question

我已使用xml2json将xml文件转换为json。

其中一小部分如下所示。我想将其转换为csv。我正在使用csvkit in2csv

使用基本语法显示错误，简单。

C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json > test2.csv
When converting a JSON document with a top-level dictionary element, a key must
be specified.

因此添加密钥我没有错误，但也没有输出。

C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json -k "//Meeting/Races" > te
st2.csv
'//Meeting/Races'

C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json -k "//Meeting/Races/RaceE
ntries/RaceEntry" > test2.csv
'//Meeting/Races/RaceEntries/RaceEntry'

我现在尝试了各种各样的键并且没有错误但也没有输出，有关如何在csv中提供输出的任何想法吗？

{
  "Meeting": {
    "NumOfRaces": {
      "#tail": "\n  ",
      "#text": "9"
    },
    "WeightsPublishing": {
      "#tail": "\n  ",
      "#text": "2014-09-30T00:00:00+10:00"
    },
    "NominationsClose": {
      "#tail": "\n  ",
      "#text": "2014-09-29T12:00:00+10:00"
    },
    "CodeType": {
      "#tail": "\n  ",
      "#text": "GALLOPS"
    },
    "Track": {
      "Rainfall": {
        "#tail": "\n    ",
        "#text": "Nil last 24hrs, 4.2mm last 7 days"
      },
      "Irrigation": {
        "#tail": "\n    ",
        "#text": "Nil last 24hrs, 25mm last 7 days"
      },
      "RailPosition": {
        "#tail": "\n    ",
        "#text": "+9m Entire Circuit"
      },
      "#tail": "\n  ",
      "TrackSurface": {
        "#tail": "\n    ",
        "#text": "Turf"
      },
      "Comments": {
        "#tail": "\n    ",
        "#text": "Finalised 4\/10 - 7:45am  Late Scratching Race 3 No. 4"
      },
      "Weather": {
        "#tail": "\n    ",
        "#text": "Fine"
      },
      "Penetrometer": {
        "#tail": "\n    ",
        "#text": "4.83"
      },
      "RailPositionLastMeeting": {
        "#tail": "\n    ",
        "#text": "True Position Entire Circuit"
      },
      "TrackInfo": {
        "#tail": "\n  ",
        "#text": "Penetrometer: Inside 4.85, Outside 4.85"
      },
      "TrackRating": {
        "#tail": "\n    ",
        "#text": "Good"
      },
      "#text": "\n    ",
      "RacingDirection": {
        "#tail": "\n    ",
        "#text": "AntiClockwise"
      }
    },
    "MeetingStage": {
      "#tail": "\n  ",
      "#text": "Acceptances"
    },
    "Races": {
      "#tail": "\n",
      "#text": "\n    ",
      "Race": [
        {
          "Comments": {
            "#tail": "\n    "
          },
          "NominationsDivisor": {
            "#tail": "\n      ",
            "#text": "0"
          },
          "Starters": {
            "#tail": "\n      ",
            "#text": "11"
          },
          "TrackRecords": {
            "#tail": "\n      ",
            "TrackRecord": {
              "TrackRecordHorse": {
                "#tail": "\n        "
              },
              "#text": "\n          ",
              "#tail": "\n      ",
              "DistanceRace": {
                "#tail": "\n          ",
                "#text": "1000"
              },
              "Time": {
                "#tail": "\n          ",
                "#text": "00:00:55.420"
              },
              "RaceNumber": {
                "#tail": "\n          ",
                "#text": "7"
              },
              "RaceDate": {
                "#tail": "\n          ",
                "#text": "2013-02-16"
              }
            },
            "#text": "\n        "
          },
          "RaceDistance": {
            "#tail": "\n      ",
            "#text": "1000"
          },
          "NominationsRaceNumber": {
            "#tail": "\n      ",
            "#text": "1"
          },
          "ApprenticeCanClaim": {
            "#tail": "\n      ",
            "#text": "false"
          },
          "SizeField": {
            "#tail": "\n      ",
            "#text": "16"
          },
          "NameRaceForm": {
            "#tail": "\n      ",
            "#text": "MARIBYRNONG TRL"
          },
          "RaceType": {
            "#tail": "\n      ",
            "#text": "Flat"
          },
          "SizeEmergency": {
            "#tail": "\n      ",
            "#text": "4"
          },
          "DistanceApprox": {
            "#tail": "\n      ",
            "#text": "false"
          },
          "#text": "\n      ",
          "BallotedOutEntries": {
            "#tail": "\n      "
          },
          "Logos": {
            "#tail": "\n      ",
            "Logo": {
              "#tail": "\n      "
            },
            "#text": "\n        "
          },
          "#tail": "\n    ",
          "TrackCircumference": {
            "#tail": "\n      ",
            "#text": "2313"
          },
          "NameRaceNews": {
            "#tail": "\n      ",
            "#text": "Maribyrnong Trial Stakes"
          },
          "WeightChange": {
            "#tail": "\n      ",
            "#text": "0.00"
          },
          "Accepters": {
            "#tail": "\n      ",
            "#text": "12"
          },
          "RaceEntries": {
            "RaceEntry": [
              {
                "Trainer": {
                  "Location": {
                    "#tail": "\n            ",
                    "#text": "Cranbourne"
                  },
                  "#text": "\n            ",
                  "Surname": {
                    "#tail": "\n            ",
                    "#text": "Laing"

Answer 1

您正在做的事情有两个问题。

首先，您正在错误地指定密钥（在这种情况下，当您处理JSON时，您正在使用XML / XPath样式，使用斜杠）。您应该只提供元素的名称（例如会议）。

但是，主要问题是您使用的JSON类型，它由多个嵌套字典组成，in2csv无法真正处理（有几个级别，它如何知道要使用哪些列？）。您需要以某种方式展平您的数据，以便可以清楚地识别字段。

您可以查看this question有关如何将JSON转换为CSV的想法，因为我不认为in2csv会在您的情况下削减它。

Answer 2

如果您要查找的是将每个XML路径转换为路径表达式，将其用于CSV中的第1列，并使用第2列最低级别的值，以下代码可能会解决您的问题：

import json

json_input = """{
  "Meeting": {
    "NominationsClose": {
      "#tail": "\\n  ",
      "#text": "2014-09-29T12:00:00+10:00"
    },
    "CodeType": {
      "#tail": "\\n  ",
      "#text": "GALLOPS"
    },
    "Track": {
      "Rainfall": {
        "#tail": "\\n    ",
        "#text": "Nil last 24hrs, 4.2mm last 7 days"
      },
      "Irrigation": {
        "#tail": "\\n    ",
        "#text": "Nil last 24hrs, 25mm last 7 days"
      }
    }
  }
}"""

def print_csv_depth_first(tree, path=""):
    if isinstance(tree, dict):
        for key in tree.keys():
            print_csv_depth_first(tree[key], "{}/{}".format(path, key))
    elif isinstance(tree, list):
        for i in range(len(tree)):
            print_csv_depth_first(tree[i], "{}/{}".format(path, str(i)))
    elif isinstance(tree, str):
        entry = tree
        print('{},{}'.format(path, repr(entry)))
        return

json = json.loads(json_input)
print_csv_depth_first(json)

我已经包含了一小部分示例JSON数据。在最底层，您的数据还包含列表的开头"RaceEntry": [，但这不完整，所以我不得不推断。上面的代码产生以下输出：

/Meeting/NominationsClose/#tail,'\n  '
/Meeting/NominationsClose/#text,'2014-09-29T12:00:00+10:00'
/Meeting/CodeType/#tail,'\n  '
/Meeting/CodeType/#text,'GALLOPS'
/Meeting/Track/Rainfall/#tail,'\n    '
/Meeting/Track/Rainfall/#text,'Nil last 24hrs, 4.2mm last 7 days'
/Meeting/Track/Irrigation/#tail,'\n    '
/Meeting/Track/Irrigation/#text,'Nil last 24hrs, 25mm last 7 days'

您必须调整包含print语句的行以满足您的需求。

json-＆GT;使用in2csv的csv - 指定键不返回任何值

2 个答案: