我有一个文件(注意:它不是一个json文件)我需要解析并插入到数据库中,因此大括号内以#TRANS开头的行应该属于以#VER开头的行以上。我想我应该使用preg_match或preg_match_all但不确定正则表达式应该如何?
#VER 1 5 20170128
{
#TRANS 8000 {} 4016.00 20170128 "something"
#TRANS 1100 {} -4016.00 20170128 "something"
}
#VER 1 6 20170128
{
#TRANS 8010 {} 5016.00 20170128 "something else"
#TRANS 1130 {} -5016.00 20170128 "something else"
}
我会将它们分成两个表,我可以分多步完成,而不是一次性完成。所以首先我要插入VER行,然后在花括号内的下方获取TRANS行,并遍历找到的行并解析它们。
以下是从文件的第一个示例开始,DB的示例:
Table VER:
ID | VER_DATE | (some other stuff)
5 | 20170128 | ...
Table TRANS:
ID | VER_ID | SERIAL | VALUE | DATE
1 | 5 | 8000 | 4016 | 20170128
1 | 5 | 1100 | -4016 | 20170128
答案 0 :(得分:1)
好吧,如果您不需要支持复杂的语法并且只进行一次转换,那么只需使用fopen
和fgets
即可轻松完成:
<?php
$fp = fopen('/path/to/file', 'rb');
$head = '';
$blockBuffer = '';
while (true) {
$line = fgets($fp);
if ($line === false) break;
if (substr($line, 0, 4) === '#VER') {
$head = $line;
$blockBuffer = '';
}
if (!empty($head)) {
$blockBuffer .= $line;
}
if (trim($line) === '}') {
var_dump($head);
var_dump($blockBuffer);
$head = '';
}
}
请注意,这是解析数据的脆弱方式。
答案 1 :(得分:1)
您可以通过解析和处理每一行来采用某种编译器方法。它具有处理格式错误的数据的能力。
class ParserException extends Exception {
public $line;
function __construct($message, $line) {
parent::__construct($message);
$this->line = $line;
}
}
class Parser {
const Top = 1;
const InBlock = 2;
const Ver = 4;
const Noop = 6;
public $ver = null;
function parse($input) {
$lines = explode("\n", $input);
$state = self::Top;
$this->line = 1;
try {
foreach($lines as $line) {
// echo "$line\n";
switch($state) {
case self::Top:
if ( preg_match('/^#VER (\d) (\d) ([0-9]+)/', $line, $matches) ) {
$this->emitVer($matches);
$state = self::Ver;
}
else {
throw new ParserException("VER not found", $this->line);
}
break;
case self::Ver:
if ( substr(trim($line), 0, 1) == '{' ) {
$state = self::InBlock;
}
else {
throw new ParserException("Expected { ", $this->line);
}
break;
case self::InBlock:
$trimline = trim($line);
// echo ">>> $trimline\n";
// #TRANS 8000 {} 4016.00 20170128 "something"
if ( preg_match('/^#TRANS ([0-9]+) \{\} ([0-9.-]+) ([0-9]+) "(.*)"/', $trimline, $matches) ) {
$this->emitTrans($matches);
} elseif ( substr($trimline, 0, 1) == '}' ) {
$state = self::Top;
}
else {
throw new ParserException("Expected TRANS or } ", $this->line);
}
break;
default:
// unknown state
throw new ParserException("Unexpected error ", $this->line);
break;
}
$this->line++;
}
}
catch(ParserException $e) {
echo "Parser error. " . $e->getMessage() . ' Line ' . $e->line . PHP_EOL;
}
}
function emitVer($ver) {
echo sprintf("id %s version date %s\n", $ver[2], $ver[3]);
$this->ver = ['id' => $ver[2], 'date' => $ver[3]]; // remember the version
}
function emitTrans($trans) {
echo sprintf("Trans ver-id = %s serial = %s value = %s date = %s\n", $this->ver['id'], $trans[1], $trans[2], $trans[3], $trans[4]);
}
function outr($x) {
print_r($x);
echo "\n";
}
}
$p = new Parser;
$p->parse('#VER 1 5 20170128
{
#TRANS 8000 {} 4016.00 20170128 "something"
#TRANS 1100 {} -4016.00 20170128 "something"
}
#VER 1 6 20170128
{
#TRANS 8010 {} 5016.00 20170128 "something else"
#TRANS 1130 {} -5016.00 20170128 "something else"
}');
答案 2 :(得分:1)
如果您想使用正则表达式,可以这样做:
preg_match_all('/#VER (.*)\s*\{((\s*#TRANS\s*[0-9]+\s\{.*\}.*$)*\s*)\}/msU', $text, $matches);
$size = sizeof($matches[0]);
for($i = 0; $i < $size; $i++){
$text = $matches[2][$i];
echo "================\n";
echo $matches[1][$i]."\n";
preg_match_all('/\s*#TRANS\s([0-9]+)\s(\{.*\})\s([0-9\-\.]+)\s([0-9]+)\s"(.*)"$/', $text, $matches2);
$size2 = sizeof($matches2[0]);
for($j = 0; $j < $size2; $j++){
echo $matches2[1][$j]."\n";
echo $matches2[2][$j]."\n";
echo $matches2[3][$j]."\n";
echo $matches2[4][$j]."\n";
echo $matches2[5][$j]."\n";
}
}
输出将是:
================
1 5 20170128
1100
{}
-4016.00
20170128
something
================
1 6 20170128
1130
{}
-5016.00
20170128
something else
如果您愿意,可以拆分标题($matches[1][$i])
。