删除某些列时将CSV转换为JSON

时间:2019-05-06 11:07:45

标签: python json csv merge

我从OECD下载了有关收入不平等的数据集,作为csv文件。我只想保留数据:LOCATION,TIME,VALUE。

这是CSV头部的一部分:

"LOCATION","INDICATOR","SUBJECT","MEASURE","FREQUENCY","TIME","Value","Flag Codes"
"AUS","INCOMEINEQ","GINI","INEQ","A","2014",0.337,
"AUS","INCOMEINEQ","GINI","INEQ","A","2016",0.33,
"AUT","INCOMEINEQ","GINI","INEQ","A","2014",0.274,
"AUT","INCOMEINEQ","GINI","INEQ","A","2015",0.276,
"AUT","INCOMEINEQ","GINI","INEQ","A","2016",0.284,

到目前为止,这是我的转换器代码:

#!/usr/bin/env python

"""Universal CSV to JSON converter with scalability options"""

__author__      = "Tim Verlaan 11669128"

import csv  
import json  

def convert():
    """Convert CSV file to JSON file"""

    # Open the CSV  
    f = open( 'data.csv')  

    # Change each fieldname to the appropriate field name.    
    reader = csv.DictReader( f, fieldnames = ( "LOCATION","INDICATOR","SUBJECT","MEASURE","FREQUENCY","TIME","Value","Flag Codes" ))  

    # skip the header 
    next(reader)

    # Parse the CSV into JSON  
    out = json.dumps( [ row for row in reader ] )  

    # Save the JSON  
    f = open( 'data_oecd.json', 'w')  
    f.write(out)  


if __name__ == "__main__":
    """Separating the function, for scalability purposes"""

    convert()

现在的结果:

[{"LOCATION": "AUS", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2014", "Value": "0.337", "Flag Codes": ""}, {"LOCATION": "AUS", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2016", "Value": "0.33", "Flag Codes": ""}, {"LOCATION": "AUT", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2014", "Value": "0.274", "Flag Codes": ""}, {"LOCATION": "AUT", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2015", "Value": "0.276", "Flag Codes": ""}, {"LOCATION": "AUT", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2016", "Value": "0.284", "Flag Codes": ""}

想要的结果:

[{"LOCATION": "AUS", "TIME": 2014, "VALUE": 0.337}, {"LOCATION": "AUS", "TIME": 2016, "VALUE": 0.33}

3 个答案:

答案 0 :(得分:0)

用熊猫很容易做到这一点:

Navbar

import pandas as pd df = pd.read_csv('data.csv') df[['LOCATION', 'TIME', 'Value']].to_json(orient='records') 部分很重要,否则它将按列而不是行分组

答案 1 :(得分:0)

您可以使用熊猫并仅选择必需的列

def format_df(x):
    if x.Date1 != x.Date2:
        return ['']+['background-color: red']
    return [''] * len(x)

df.style.apply(lambda x: format_df(x), axis=1)

答案 2 :(得分:0)

您可以在列表理解中提取所需的键

例如:

<Highlight innerHTML={true}>{content}</Highlight>

输出:

import csv
import json

with open('data.csv') as infile:
    reader = csv.DictReader(infile)
    out = [{"LOCATION": row['LOCATION'],"TIME": row["TIME"], "VALUE": ["Value"]} for row in reader]

with open('data_oecd.json', 'w') as outfile:
    json.dump(out, outfile)                       #Write to JSON.