将大型字典写入CSV时,退出代码137(被信号9:SIGKILL中断)

时间:2020-03-30 21:12:31

标签: python pandas csv dictionary

使用下面的代码,我将大量的XML文件(大约300.000)读入了一个嵌套字典。我想将此写入单个CSV文件中。第一次尝试时,我是使用pandas数据框作为中介的。该字典已完全构建,但是在最后一步中,当转换为CSV时,我得到exit code 137 (interrupted by signal 9: SIGKILL)。 (我发现,到目前为止,构建嵌套词典而不是追加数据框是最快的选择。)

您知道如何通过避免此错误来设法写入单个CSV吗?有没有办法释放介于两者之间的内存?

谢谢!

#Import packages.

import pandas as pd
from lxml import etree
import os
from os import listdir
from os.path import isfile, join
from tqdm import tqdm
from datetime import datetime

from collections import defaultdict

#Set options for displaying results
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)


def run(file, content):
    data = etree.parse(file)

    #get all paths from the XML
    get_path = lambda x: data.getpath(x)
    paths = list(map(get_path, data.getroot().getiterator()))
    content = ""
    content = [
        data.getroot().xpath(path)
        for path in paths
    ]

    get_text = lambda x: x.text

    content = [list(map(get_text, i)) for i in content]
    content = dict(zip(paths, content))
    content = {
        content["/clinical_study/id_info/nct_id"][0]: content
    }
    dict_final.update(content)


def write_csv(df_name, csv):
    df_name.to_csv(csv, sep=";")


#######RUN######

mypath = '/Users/Documents/AllPublicXML'

folder_all = os.listdir(mypath)

dict_final = {}
df_final = pd.DataFrame()

for folder in tqdm(folder_all):

    mypath2 = mypath + "/" + folder
    print(folder)

    if os.path.isdir(mypath2):
        file = [f for f in listdir(mypath2) if isfile(join(mypath2, f))]
        output = "./Output/" + folder + ".csv"
        for x in tqdm(file):
            dir = mypath2 + "/" + x
            #output = "./Output/"+x+".csv"
            dict_name = x.split(".", 1)[0]
            try:
                run(dir,dict_name)

            except:
                log = open("log.txt", "a+")
                log.write(str(datetime.now()) + ": Error in file " +x+"\r \n")
                pass

    log = open("log.txt", "a+")
    log.write(str(datetime.now()) + ": " + folder +" written succesfully \r \n")


df_final = pd.DataFrame.from_dict(dict_final, orient='index')
write_csv(df_final, "./Output/final_csv.csv")

log.close()

XML看起来像这样

<clinical_study>
<!--
 This xml conforms to an XML Schema at:
    https://clinicaltrials.gov/ct2/html/images/info/public.xsd 
-->
<required_header>
<download_date>
ClinicalTrials.gov processed this data on March 20, 2020
</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT03261284</url>
</required_header>
<id_info>
<org_study_id>2017-P-032</org_study_id>
<nct_id>NCT03261284</nct_id>
</id_info>
<brief_title>
D-dimer to Guide Anticoagulation Therapy in Patients With Atrial Fibrillation
</brief_title>
<acronym>DATA-AF</acronym>
<official_title>
D-dimer to Determine Intensity of Anticoagulation to Reduce Clinical Outcomes in Patients With Atrial Fibrillation
</official_title>
<sponsors>
<lead_sponsor>
<agency>Wuhan Asia Heart Hospital</agency>
<agency_class>Other</agency_class>
</lead_sponsor>
</sponsors>
<source>Wuhan Asia Heart Hospital</source>
<oversight_info>
<has_dmc>Yes</has_dmc>
<is_fda_regulated_drug>No</is_fda_regulated_drug>
<is_fda_regulated_device>No</is_fda_regulated_device>
</oversight_info>
<brief_summary>
<textblock>
This was a prospective, three arms, randomized controlled study.
</textblock>
</brief_summary>
<detailed_description>
<textblock>
D-dimer testing is performed in AF Patients receiving warfarin therapy (target INR:1.5-2.5) in Wuhan Asia Heart Hospital. Patients with elevated d-dimer levels (>0.5ug/ml FEU) were SCREENED AND RANDOMIZED to three groups at a ratio of 1:1:1. First, NOAC group,the anticoagulant was switched to Dabigatran (110mg,bid) when elevated d-dimer level was detected during warfarin therapy.Second,Higher-INR group, INR was adjusted to higher level (INR:2.0-3.0) when elevated d-dimer level was detected during warfarin therapy. Third, control group, patients with elevated d-dimer levels have no change in warfarin therapy. Warfarin is monitored once a month by INR ,and dabigatran dose not need monitor. All patients were followed up for 24 months until the occurrence of endpoints, including bleeding events, thrombotic events and all-cause deaths.
</textblock>
</detailed_description>
<overall_status>Enrolling by invitation</overall_status>
<start_date type="Anticipated">March 1, 2019</start_date>
<completion_date type="Anticipated">May 30, 2020</completion_date>
<primary_completion_date type="Anticipated">February 28, 2020</primary_completion_date>
<phase>N/A</phase>
<study_type>Interventional</study_type>
<has_expanded_access>No</has_expanded_access>
<study_design_info>
<allocation>Randomized</allocation>
<intervention_model>Parallel Assignment</intervention_model>
<primary_purpose>Treatment</primary_purpose>
<masking>None (Open Label)</masking>
</study_design_info>
<primary_outcome>
<measure>Thrombotic events</measure>
<time_frame>24 months</time_frame>
<description>
Stroke, DVT, PE, Peripheral arterial embolism, ACS etc.
</description>
</primary_outcome>
<primary_outcome>
<measure>hemorrhagic events</measure>
<time_frame>24 months</time_frame>
<description>cerebral hemorrhage,Gastrointestinal bleeding etc.</description>
</primary_outcome>
<secondary_outcome>
<measure>all-cause deaths</measure>
<time_frame>24 months</time_frame>
</secondary_outcome>
<number_of_arms>3</number_of_arms>
<enrollment type="Anticipated">600</enrollment>
<condition>Atrial Fibrillation</condition>
<condition>Thrombosis</condition>
<condition>Hemorrhage</condition>
<condition>Anticoagulant Adverse Reaction</condition>
<arm_group>
<arm_group_label>DOAC group</arm_group_label>
<arm_group_type>Experimental</arm_group_type>
<description>
Patients with elevated d-dimer levels was switched to DOAC (dabigatran 150mg, bid).
</description>
</arm_group>
<arm_group>
<arm_group_label>Higher-INR group</arm_group_label>
<arm_group_type>Experimental</arm_group_type>
<description>
Patients' target INR was adjusted from 1.5-2.5 to 2.0-3.0 by adding warfarin dose.
</description>
</arm_group>
<arm_group>
<arm_group_label>Control group</arm_group_label>
<arm_group_type>No Intervention</arm_group_type>
<description>
Patients continue previous strategy without change.
</description>
</arm_group>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Dabigatran Etexilate 150 MG [Pradaxa]</intervention_name>
<description>Dabigatran Etexilate 150mg,bid</description>
<arm_group_label>DOAC group</arm_group_label>
<other_name>Pradaxa</other_name>
</intervention>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Warfarin Pill</intervention_name>
<description>Add warfarin dose according to INR values.</description>
<arm_group_label>Higher-INR group</arm_group_label>
</intervention>
<eligibility>
<criteria>
<textblock>
Inclusion Criteria: - Patients with non-valvular atrial fibrillation - Receiving warfarin therapy Exclusion Criteria: - Patients who had suffered from recent (within 3 months) myocardial infarction, ischemic stroke, deep vein thrombosis, cerebral hemorrhages, or other serious diseases. - Those who had difficulty in compliance or were unavailable for follow-up.
</textblock>
</criteria>
<gender>All</gender>
<minimum_age>18 Years</minimum_age>
<maximum_age>75 Years</maximum_age>
<healthy_volunteers>No</healthy_volunteers>
</eligibility>
<overall_official>
<last_name>Zhenlu ZHANG, MD,PhD</last_name>
<role>Study Director</role>
<affiliation>Wuhan Asia Heart Hospital</affiliation>
</overall_official>
<location>
<facility>
<name>Zhang litao</name>
<address>
<city>Wuhan</city>
<state>Hubei</state>
<zip>430022</zip>
<country>China</country>
</address>
</facility>
</location>
<location_countries>
<country>China</country>
</location_countries>
<verification_date>March 2019</verification_date>
<study_first_submitted>August 22, 2017</study_first_submitted>
<study_first_submitted_qc>August 23, 2017</study_first_submitted_qc>
<study_first_posted type="Actual">August 24, 2017</study_first_posted>
<last_update_submitted>March 6, 2019</last_update_submitted>
<last_update_submitted_qc>March 6, 2019</last_update_submitted_qc>
<last_update_posted type="Actual">March 7, 2019</last_update_posted>
<responsible_party>
<responsible_party_type>Sponsor</responsible_party_type>
</responsible_party>
<keyword>D-dimer</keyword>
<keyword>Nonvalvular atrial fibrillation</keyword>
<keyword>Direct thrombin inhibitor</keyword>
<keyword>INR</keyword>
<condition_browse>
<!--
 CAUTION:  The following MeSH terms are assigned with an imperfect algorithm            
-->
<mesh_term>Atrial Fibrillation</mesh_term>
<mesh_term>Thrombosis</mesh_term>
<mesh_term>Hemorrhage</mesh_term>
</condition_browse>
<intervention_browse>
<!--
 CAUTION:  The following MeSH terms are assigned with an imperfect algorithm            
-->
<mesh_term>Warfarin</mesh_term>
<mesh_term>Dabigatran</mesh_term>
<mesh_term>Fibrin fragment D</mesh_term>
</intervention_browse>
<!--
 Results have not yet been posted for this study                                          
-->
</clinical_study>

0 个答案:

没有答案
相关问题