Question

我很难尝试使用Pandas将如下所示的JSON字符串转换为CSV。

这是我的示例字符串（也可以从文件中读取）：

{
   "count": 8, 
   "facets": [], 
   "results": [
      {
         "protocol": "DWC_ARCHIVE", 
         "taxonKey": 4332928, 
         "family": "Diaptomidae", 
         "institutionCode": "MNHN", 
         "lastInterpreted": "2017-05-17T13:20:23.744+0000", 
         "speciesKey": 4332928, 
         "gbifID": "694182141", 
         "identifiedBy": "Dussart B.", 
         "lastParsed": "2017-05-17T13:19:47.003+0000", 
         "phylum": "Arthropoda", 
         "orderKey": 679, 
         "facts": [], 
         "species": "Diaptomus kenitraensis", 
         "issues": [], 
         "occurrenceID": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2010-6707", 
         "countryCode": null, 
         "basisOfRecord": "PRESERVED_SPECIMEN", 
         "relations": [], 
         "classKey": 203, 
         "catalogNumber": "2010-6707", 
         "scientificName": "Diaptomus kenitraensis Kiefer, 1926", 
         "taxonRank": "SPECIES", 
         "familyKey": 9038, 
         "kingdom": "Animalia", 
         "publishingOrgKey": "2cd829bb-b713-433d-99cf-64bef11e5b3e", 
         "collectionCode": "IU", 
         "kingdomKey": 1, 
         "genusKey": 2114554, 
         "key": 694182141, 
         "phylumKey": 54, 
         "genericName": "Diaptomus", 
         "class": "Maxillopoda", 
         "crawlId": 116, 
         "individualCount": 1, 
         "publishingCountry": "FR", 
         "identifier": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2010-6707", 
         "lastCrawled": "2017-08-03T14:05:37.635+0000", 
         "license": "http://creativecommons.org/licenses/by/4.0/legalcode", 
         "datasetKey": "da6a07ed-9eee-460d-9448-910f542c1a7b", 
         "specificEpithet": "kenitraensis", 
         "identifiers": [], 
         "modified": "2015-06-19T19:23:01.000+0000", 
         "extensions": {}, 
         "genus": "Diaptomus", 
         "order": "Calanoida"
      }, 
      {
         "protocol": "DWC_ARCHIVE", 
         "taxonKey": 4332928, 
         "family": "Diaptomidae", 
         "institutionCode": "MNHN", 
         "lastInterpreted": "2017-05-17T13:19:51.210+0000", 
         "speciesKey": 4332928, 
         "gbifID": "440012453", 
         "identifiedBy": "Dussart B.", 
         "lastParsed": "2017-05-17T13:19:31.422+0000", 
         "phylum": "Arthropoda", 
         "orderKey": 679, 
         "facts": [], 
         "species": "Diaptomus kenitraensis", 
         "issues": [], 
         "occurrenceID": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2007-1537", 
         "countryCode": null, 
         "basisOfRecord": "PRESERVED_SPECIMEN", 
         "relations": [], 
         "classKey": 203, 
         "catalogNumber": "2007-1537", 
         "scientificName": "Diaptomus kenitraensis Kiefer, 1926", 
         "taxonRank": "SPECIES", 
         "familyKey": 9038, 
         "kingdom": "Animalia", 
         "publishingOrgKey": "2cd829bb-b713-433d-99cf-64bef11e5b3e", 
         "collectionCode": "IU", 
         "kingdomKey": 1, 
         "genusKey": 2114554, 
         "key": 440012453, 
         "phylumKey": 54, 
         "genericName": "Diaptomus", 
         "class": "Maxillopoda", 
         "crawlId": 116, 
         "individualCount": 8, 
         "publishingCountry": "FR", 
         "identifier": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2007-1537", 
         "lastCrawled": "2017-08-03T14:05:30.146+0000", 
         "license": "http://creativecommons.org/licenses/by/4.0/legalcode", 
         "datasetKey": "da6a07ed-9eee-460d-9448-910f542c1a7b", 
         "specificEpithet": "kenitraensis", 
         "identifiers": [], 
         "modified": "2015-06-19T19:23:00.000+0000", 
         "extensions": {}, 
         "genus": "Diaptomus", 
         "order": "Calanoida"
      }
   ], 
   "endOfRecords": false, 
   "limit": 2, 
   "offset": 0
}

我感兴趣的是“结果”部分。

使用Pandas，我试过了：

df = pd.read_json(json_string)
df.to_csv("output.csv", index=False, sep='\t', encoding="utf-8")

但是我收到了以下错误：

File "C:\Python27\lib\site-packages\pandas\io\json.py", line 281, in read_json
    date_unit).parse()
  File "C:\Python27\lib\site-packages\pandas\io\json.py", line 349, in parse
    self._parse_no_numpy()
  File "C:\Python27\lib\site-packages\pandas\io\json.py", line 566, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
TypeError: Expected String or Unicode

我也从这里尝试了大多数更详细的建议：How can I convert JSON to CSV?，试图将上面的json直接转换为CSV（绕过Pandas），但没有成功。

有人能给我一个提示吗？提前感谢您提供的任何帮助。

致以最诚挚的问候，

Answer 1

您可以使用json_normalize：

import json
from pandas.io.json import json_normalize

with open('file.json') as data_file:    
    data = json.load(data_file) 

df = json_normalize(data, 'results')
df.to_csv("output.csv", index=False, sep='\t', encoding="utf-8") #write to csv file

print (df)
        basisOfRecord catalogNumber        class  classKey collectionCode  \
0  PRESERVED_SPECIMEN     2010-6707  Maxillopoda       203             IU   
1  PRESERVED_SPECIMEN     2007-1537  Maxillopoda       203             IU   

  countryCode  crawlId                            datasetKey extensions facts  \
0        None      116  da6a07ed-9eee-460d-9448-910f542c1a7b         {}    []   
1        None      116  da6a07ed-9eee-460d-9448-910f542c1a7b         {}    []   

     ...         protocol  publishingCountry  \
0    ...      DWC_ARCHIVE                 FR   
1    ...      DWC_ARCHIVE                 FR   

                       publishingOrgKey relations  \
0  2cd829bb-b713-433d-99cf-64bef11e5b3e        []   
1  2cd829bb-b713-433d-99cf-64bef11e5b3e        []   

                        scientificName                 species speciesKey  \
0  Diaptomus kenitraensis Kiefer, 1926  Diaptomus kenitraensis    4332928   
1  Diaptomus kenitraensis Kiefer, 1926  Diaptomus kenitraensis    4332928   

  specificEpithet taxonKey  taxonRank  
0    kenitraensis  4332928    SPECIES  
1    kenitraensis  4332928    SPECIES  

[2 rows x 45 columns]

使用Pandas将JSON转换为CSV

1 个答案: