如何使用python将CSV文件中的数百列合并

时间:2017-08-24 20:29:53

标签: python json csv

假设我有一个包含数百列(逗号分隔)的CSV文件:

    $curl = curl_init($URL);

curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "PUT");  

curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);

$jsonData = array(
    'OutputID' => 'Some key',
    'Activate' => true,
);

$jsonDataEncoded = json_encode($jsonData);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $jsonDataEncoded);

curl_setopt($curl, CURLOPT_HTTPHEADER, array(   
    'Content-Type: application/json',
    'Content-Length: 0',
    $Authorization_Token_Key));


if(curl_exec($curl) === false)
{
    echo 'Curl error: ' . curl_error($curl);
}
else
{
    echo '';
}

如何合并列以便在一列中包含所有“数据”? CSV文件是从JSON文件创建的:https://nvd.nist.gov/vuln/data-feeds#JSON_FEED

以下是我正在使用的代码(来自其他SO帖子)从JSON转换为CSV:

ID,Column1,Column2,...,Column700
1,data,,...,
2,,data,...,
...
700,,,...,data

以下是JSON文件中的一个示例条目:

def to_string(s):
    try:
        return str(s)
    except:
        #Change the encoding type if needed
        return s.encode('utf-8')

def reduce_item(key, value):
    global reduced_item

    #Reduction Condition 1
    if type(value) is list:
        i=0
        for sub_item in value:
            reduce_item(key+'_'+to_string(i), sub_item)
            i=i+1

    #Reduction Condition 2
    elif type(value) is dict:
        sub_keys = value.keys()
        for sub_key in sub_keys:
            reduce_item(key+'_'+to_string(sub_key), value[sub_key])

    #Base Condition
    else:
        reduced_item[to_string(key)] = to_string(value)

if __name__ == "__main__":
    if len(sys.argv) == 4: #original was !=
        print ("\nUsage: python json_to_csv.py <node_name> <json_in_file_path> <csv_out_file_path>\n")
        #print ("Trying this without command line arguments")
    else:
    #Reading arguments
        #node = sys.argv[1]
        #json_file_path = sys.argv[2]
        #csv_file_path = sys.argv[3]

        node = "CVE_Items"
        json_file_path = "some\file.json"
        csv_file_path = "some\file.csv"

        fp = open(json_file_path, 'r')
        json_value = fp.read()
        raw_data = json.loads(json_value)

        try:
            data_to_be_processed = raw_data[node]
        except:
            data_to_be_processed = raw_data

        processed_data = []
        header = []
        for item in data_to_be_processed:
            reduced_item = {}
            reduce_item(node, item)

            header += reduced_item.keys()

            processed_data.append(reduced_item)

        header = list(set(header))
        header.sort()

        with open(csv_file_path, 'w', newline='') as f:
            writer = csv.DictWriter(f, header, quoting=csv.QUOTE_ALL)
            writer.writeheader()
            for row in processed_data:
                writer.writerow(row)

1 个答案:

答案 0 :(得分:0)

如果我理解正确,您有一个包含许多列的CSV文件,但每个记录只有ID和一个(非ID)列填充,并且没有两个记录具有相同(非ID)列填充。所以基本上你想把所有700条记录“折叠”成一条。

由于ID值都不同,我不确定如何将它们合并到一个记录中。我假设你知道在这种情况下该怎么做,所以我不打算试图让这部分正确。

假设您的数据位于名为pd.DataFrame.from_items(name_dict.items(), orient='index', columns=['A','B']) 的文件中,您可以将记录读入nist.csv,如下所示:

merged_record

获得import csv reader = csv.reader(open('nist.csv')) # Extract the header: header = next(reader) # Initialize an empty merged_record: merged_record = [''] * len(header) # Populate the merged_record with data from the CSV file: for record in reader: for i,value in enumerate(record): if value: # only overwrite if the value is non-empty merged_record[i] = value header后,您可以将其输出到名为merged_record的新CSV文件中,如下所示:

nist_merged.csv

你可以将它减少到这样一行:

with open('nist_merged.csv', 'w') as outputfile:
    writer = csv.writer(outputfile)
    writer.writerow(header)
    writer.writerow(merged_record)

同样,ID字段可能不是您想要的,但由于此新CSV文件中只有两行,因此您应该很容易将其更改为您想要的任何内容。