优化PHP脚本以便在MongoDB

时间:2016-09-07 17:11:33

标签: php mongodb

我遇到这种情况:

  • 超过2000个具有修复结构的文件;
  • 任何文件大约有200.000行
  • 对我来说,每一行都是MongoDB文件(每个文件200.000个文件)

所以,我用php打开一个文件并在MongoDB中插入任何单个文档。 然后脚本插入大约400.000.000的文档。

我的脚本使用5小时插入大约60个文件(12.000.000文档)。

我的问题是: 是否可以优化脚本以减少插入时间?

PS: 我删除了MongoDB中的索引以便更快地插入; 我在Wamp当地工作

SCRIPT:

<?php
    // connect to mongodb
    $m = new MongoClient();
    function spacePosition($stringa){
        $pos=0;
        //substr(string,start,length) 
        $lunghezzaStringa = strlen($stringa);
        while($pos<$lunghezzaStringa){

            if(substr($stringa,$pos,1)==" ") {
                return $pos;
            } else {
                $pos=$pos+1;
            }

        }

    }

    //Funzione per inserire un Documento BED in MongoDB
    function insertDocumentBed($document,&$m){
        // select a database
        $db = $m->BRCA;
        $collection = $db->bedCollection;
        unset($document->_id);
        $collection->insert($document);
    }

    //Fine Funzione per inserire un Documento BED in MongoDB
    $directory = "D:/other/BRCA/dnamethylationTemp";
    $tumor="BRCA";
    $experiment="dnamethylation";

    if (is_dir($directory)) {

        if ($directory_handle = opendir($directory)) {
            while (($file = readdir($directory_handle)) !== false) {

                if((!is_dir($file))&($file!=".")&($file!=".."))                                                        $extension = pathinfo($file,PATHINFO_EXTENSION);

                if(trim($extension)=="meta"){
                    //...//                     
                }

                //FINE ESTRAZIONE DATI DA FILE .META
            }

            elseif(trim($extension)=="bed"){
                ini_set('max_execution_time', 0);
                //0=NOLIMIT
                $nuovaDirectory = $directory."/".$file;
                $handle = fopen($nuovaDirectory, "r");
                $filename = basename($nuovaDirectory);
                $patient_id = substr($filename,0,12);
                $document1 = array('filename'=>$filename);
                //init

                if ($handle) {
                    while (($line = fgets($handle)) !== false) {
                        $row = preg_replace('/\s+/', ' ',$line);
                        $rowSplit = explode(" ", $row);
                        $chrom = $rowSplit[0];
                        $chromStart = $rowSplit[1];
                        $chromEnd = $rowSplit[2];
                        $strand = $rowSplit[3];
                        $composite_element_ref = $rowSplit[4];
                        $beta_value = $rowSplit[5];
                        $gene_symbol = $rowSplit[6];
                        $document1["tumor"] = $tumor;
                        $document1["experiment"] = $experiment;
                        $document1["PATIENT_ID"] = $patient_id;
                        $document1["filename"] = $filename;
                        $document1["chrom"] = $chrom;
                        $document1["chromStart"] = (int)$chromStart;
                        $document1["chromEnd"] = (int)$chromEnd;
                        $document1["strand"] = $strand;
                        $document1["composite_element_ref"] = $composite_element_ref;
                        $document1["beta_value"] = (float)$beta_value;
                        $document1["gene_symbol"] = $gene_symbol;
                        insertDocumentBed($document1,$m);
                    }

                    fclose($handle);
                } else {
                    // error opening the file.
                }

                //
                //Fine Estrazione da File .BED
            }

        }

        closedir($directory_handle);
    }

}

$m->close();
?>

1 个答案:

答案 0 :(得分:0)

批量插入mongodb以获取Click Here

的更多详细信息