读取并解析制表符分隔文件PHP

时间:2016-12-01 11:16:47

标签: php python

我制作了这个Python脚本来读取制表符分隔文件,并将行以'\t'开头的值放在array中。我用于此的代码:

import sys
from collections import OrderedDict
import json
import os   

file = sys.argv[1]

f = open(file, 'r')
direc = '/dir/to/JSONs/'
fileJSON = sys.argv[1]+'.json'

key1 = OrderedDict()
summary_data = []
full_path = os.path.join(direc,fileJSON)

Read = True 
for line in f:
        if line.startswith("#"):
            Read = True

        elif line.startswith('\tC'):
            Read= True

        elif line.startswith('\t') and Read == True:
            summary = line.strip().split('\t')
            key1[summary[1]]=int(summary[0])
            Read = True    

summary_data.append(key1)
data = json.dumps(summary_data)
with open(full_path, 'w') as datafile:
    datafile.write(data)
print(data)

我正在解析的数据:

# BUSCO was run in mode: genome

    C:98.0%[S:97.0%,D:1.0%],F:0.5%,M:1.5%,n:1440

    1411    Complete BUSCOs (C)
    1397    Complete and single-copy BUSCOs (S)
    14  Complete and duplicated BUSCOs (D)
    7   Fragmented BUSCOs (F)
    22  Missing BUSCOs (M)
    1440    Total BUSCO groups searched

但是,我需要PHP中的这段代码..我已经设法用PHP打开文件并阅读本文!有人可以帮帮我吗?

3 个答案:

答案 0 :(得分:2)

我没有得到Read变量的意思 - 在你的代码中它总是为True,最后的'elif'语句就足够了。下面是你脚本的php版本

<?php
    $fileName = $argv[1];
    $dir = '/dir/to/JSONs/';
    $fullPath = $dir . $fileName . '.json';

    $data = [];
    $output = fopen($fileName, 'r');
    while (($line = fgets($output)) !== false) {
        if ($line[0] == "\t") {
            $summary = explode("\t", trim($line));
            if (count($summary) > 1) {
                $data[$summary[1]] = (int)$summary[0];
            }
        }
    }

    $strData = json_encode([$data]);
    $input = fopen($fullPath, 'w+');
    fwrite($input, $strData);
    echo $strData;

答案 1 :(得分:0)

您的代码中不需要读取变量,因此我将其删除并替换了您可以在控制台上看到结果的内容:

<?php
$file = $argv[1];
$direc = '/dir/to/JSONs/';
$key1 = [];
$summary_data = [];
$full_path = $direc.$file.'.json';
$file_handler = fopen($full_path, 'r');
if($file_handler){
    while(($line = fgets($file_handler)) !== false){
        if($line[0] == "#" || substr($line, 0 , 2) == "\tC" || empty($line) == true){
            echo 'line found : '.$line;
            continue;
        }else{
            $summary = explode("\t", $line);
            echo 'summary : '.print_r($summary,true);
            $key1[str_replace(["\r","\n"], '', $summary[2])] = (int) $summary[1];
        }
    }
}else{
    echo 'Couldn\'t open file.';
    exit();
}
array_push($summary_data, $key1);
$data = json_encode($summary_data);
fclose($file_handler);
file_put_contents($full_path, $data);

答案 2 :(得分:0)

如果要在php中执行此操作,则fgetscsv允许您指定定界符(不仅仅是逗号):

$file_resource = fopen( $file, "r");
fgetcsv($file_resource, 4096, "\t")