假设我有一个包含数百列(逗号分隔)的CSV文件:
$curl = curl_init($URL);
curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "PUT");
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
$jsonData = array(
'OutputID' => 'Some key',
'Activate' => true,
);
$jsonDataEncoded = json_encode($jsonData);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $jsonDataEncoded);
curl_setopt($curl, CURLOPT_HTTPHEADER, array(
'Content-Type: application/json',
'Content-Length: 0',
$Authorization_Token_Key));
if(curl_exec($curl) === false)
{
echo 'Curl error: ' . curl_error($curl);
}
else
{
echo '';
}
如何合并列以便在一列中包含所有“数据”? CSV文件是从JSON文件创建的:https://nvd.nist.gov/vuln/data-feeds#JSON_FEED
以下是我正在使用的代码(来自其他SO帖子)从JSON转换为CSV:
ID,Column1,Column2,...,Column700
1,data,,...,
2,,data,...,
...
700,,,...,data
以下是JSON文件中的一个示例条目:
def to_string(s):
try:
return str(s)
except:
#Change the encoding type if needed
return s.encode('utf-8')
def reduce_item(key, value):
global reduced_item
#Reduction Condition 1
if type(value) is list:
i=0
for sub_item in value:
reduce_item(key+'_'+to_string(i), sub_item)
i=i+1
#Reduction Condition 2
elif type(value) is dict:
sub_keys = value.keys()
for sub_key in sub_keys:
reduce_item(key+'_'+to_string(sub_key), value[sub_key])
#Base Condition
else:
reduced_item[to_string(key)] = to_string(value)
if __name__ == "__main__":
if len(sys.argv) == 4: #original was !=
print ("\nUsage: python json_to_csv.py <node_name> <json_in_file_path> <csv_out_file_path>\n")
#print ("Trying this without command line arguments")
else:
#Reading arguments
#node = sys.argv[1]
#json_file_path = sys.argv[2]
#csv_file_path = sys.argv[3]
node = "CVE_Items"
json_file_path = "some\file.json"
csv_file_path = "some\file.csv"
fp = open(json_file_path, 'r')
json_value = fp.read()
raw_data = json.loads(json_value)
try:
data_to_be_processed = raw_data[node]
except:
data_to_be_processed = raw_data
processed_data = []
header = []
for item in data_to_be_processed:
reduced_item = {}
reduce_item(node, item)
header += reduced_item.keys()
processed_data.append(reduced_item)
header = list(set(header))
header.sort()
with open(csv_file_path, 'w', newline='') as f:
writer = csv.DictWriter(f, header, quoting=csv.QUOTE_ALL)
writer.writeheader()
for row in processed_data:
writer.writerow(row)
答案 0 :(得分:0)
如果我理解正确,您有一个包含许多列的CSV文件,但每个记录只有ID和一个(非ID)列填充,并且没有两个记录具有相同(非ID)列填充。所以基本上你想把所有700条记录“折叠”成一条。
由于ID值都不同,我不确定如何将它们合并到一个记录中。我假设你知道在这种情况下该怎么做,所以我不打算试图让这部分正确。
假设您的数据位于名为pd.DataFrame.from_items(name_dict.items(),
orient='index',
columns=['A','B'])
的文件中,您可以将记录读入nist.csv
,如下所示:
merged_record
获得import csv
reader = csv.reader(open('nist.csv'))
# Extract the header:
header = next(reader)
# Initialize an empty merged_record:
merged_record = [''] * len(header)
# Populate the merged_record with data from the CSV file:
for record in reader:
for i,value in enumerate(record):
if value: # only overwrite if the value is non-empty
merged_record[i] = value
和header
后,您可以将其输出到名为merged_record
的新CSV文件中,如下所示:
nist_merged.csv
你可以将它减少到这样一行:
with open('nist_merged.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(header)
writer.writerow(merged_record)
同样,ID字段可能不是您想要的,但由于此新CSV文件中只有两行,因此您应该很容易将其更改为您想要的任何内容。