我有一个文本列表,它来自一个名为EVE Online的流行在线游戏,当你在游戏中杀死一个人时,这基本上会被邮寄给你。我正在构建一个工具来解析这些使用PHP来提取所有相关信息。我将需要显示的所有信息,并且我正在编写类,以便很好地将其分解为相关的封装数据。
2008.06.19 20:53:00
Victim: Massi
Corp: Cygnus Alpha Syndicate
Alliance: NONE
Faction: NONE
Destroyed: Raven
System: Jan
Security: 0.4
Damage Taken: 48436
Involved parties:
Name: Kale Kold
Security: -10.0
Corp: Vicious Little Killers
Alliance: NONE
Faction: NONE
Ship: Drake
Weapon: Hobgoblin II
Damage Done: 22093
Name: Harulth (laid the final blow)
Security: -10.0
Corp: Vicious Little Killers
Alliance: NONE
Faction: NONE
Ship: Drake
Weapon: Caldari Navy Scourge Heavy Missile
Damage Done: 16687
Name: Gistatis Tribuni / Angel Cartel
Damage Done: 9656
Destroyed items:
Capacitor Power Relay II, Qty: 2
Paradise Cruise Missile, Qty: 23
Cataclysm Cruise Missile, Qty: 12
Small Tractor Beam I
Alloyed Tritanium Bar, Qty: 2 (Cargo)
Paradise Cruise Missile, Qty: 1874 (Cargo)
Contaminated Nanite Compound (Cargo)
Capacitor Control Circuit I, Qty: 3
Ballistic Deflection Field I
'Malkuth' Cruise Launcher I, Qty: 3
Angel Electrum Tag, Qty: 2 (Cargo)
Dropped items:
Ballistic Control System I
Shield Boost Amplifier I, Qty: 2
Charred Micro Circuit, Qty: 4 (Cargo)
Capacitor Power Relay II, Qty: 2
Paradise Cruise Missile, Qty: 10
Cataclysm Cruise Missile, Qty: 21
X-Large Shield Booster II
Cataclysm Cruise Missile, Qty: 3220 (Cargo)
Fried Interface Circuit (Cargo)
F-S15 Braced Deflection Shield Matrix, Qty: 2
Salvager I
'Arbalest' Cruise Launcher I
'Malkuth' Cruise Launcher I, Qty: 2
我正在考虑使用正则表达式来解析数据,但是你会如何解决这个问题呢?您会将邮件折叠成一行字符串还是从数组中解析每一行?麻烦的是有一些异常需要考虑。
首先,'参与方:'部分是动态的,可以包含许多具有如下结构的人,但是如果计算机控制的敌人也对受害者进行了射击,那么它只会被缩短为“名称”和'Damage Done'字段,如上所示(Gistatis Tribuni / Angel Cartel)。
其次,'Destroyed'和'Dropped'项目是动态的,每封邮件的长度都不同,我也需要获取数量,不管它们是否在货物中。
欢迎采用方法的想法。
答案 0 :(得分:12)
我可能会采用状态机方法,按顺序读取每一行并根据当前状态处理它。
某些行,例如“已删除项目:”会更改状态,从而导致您将以下行解释为项目。在“阅读相关方”状态中,您将每行添加到关于此人的数据数组中,当您读取空行时,您知道您有完整的记录。
这是一个粗略的FSM,我在GraphViz中被淘汰了
某些边缘会触发代码中的操作,例如读取空行。
答案 1 :(得分:3)
如果您想要灵活的东西,请使用状态机方法。
如果你想要快速和肮脏的东西,请使用正则表达式。
对于第一个解决方案,您可以使用专门用于parsin的库,因为它不是一项简单的任务。但是因为它是相当简单的格式,你可以破解一个天真的解析器,例如:
<?php
class Parser
{
/* Enclosing the parser in a class is not mandatory but it' clean */
function Parser()
{
/* data holder */
$this->date = '';
$this->parties = array();
$this->victim = array();
$this->items = array("Destroyed" => array(),
"Dropped" => array());
/* Map you states on actions. Sub states can be necessary (and sub parsers too :-) */
$this->states = array('Victim' => 'victim_parsing',
'Involved' => 'parties_parsing' ,
'items:' => "item_parsing");
$this->state = 'start';
$this->item_parsing_state = 'Destroyed';
$this->partie_parsing_state = '';
$this->parse_tools = array('start' => 'start_parsing',
'parties_parsing' =>'parties_parsing',
'item_parsing' => 'item_parsing',
'victim_parsing' => 'victim_parsing');
}
/* the magic job is done here */
function checkLine($line)
{
foreach ($this->states as $keyword => $state)
if (strpos($line, $keyword) !== False)
$this->state = $this->states[$keyword];
return trim($line);
}
function parse($file)
{
$this->file = new SplFileObject($file);
foreach ($this->file as $line)
if ($line = $this->checkLine($line))
$this->{$this->parse_tools[$this->state]}($line);
}
/* then here you can define as much as parsing rules as you want */
function victim_parsing($line)
{
$victim_caract = explode(': ', $line);
$this->victim[$victim_caract[0]] = $victim_caract[1];
}
function start_parsing($line)
{
$this->date = $line;
}
function item_parsing($line)
{
if (strpos($line, 'items:') !== False)
{
$item_state = explode(' ', $line);
$this->item_parsing_state = $item_state[0];
}
else
{
$item_caract = explode(', Qty: ', $line);
$this->items[$this->item_parsing_state][$item_caract[0]] = array();
$item_infos = explode(' ', $item_caract[1]);
$this->items[$this->item_parsing_state][$item_caract[0]] ['qty'] = empty($item_infos[0]) ? 1 : $item_infos[0];
$this->items[$this->item_parsing_state][$item_caract[0]] ['cargo'] = !empty( $item_infos[1]) ? "True": "False";
if (empty( $this->items[$this->item_parsing_state][$item_caract[0]] ['qty'] ))
print $line;
}
}
function parties_parsing($line)
{
$partie_caract = explode(': ', $line);
if ($partie_caract[0] == "Name")
{
$this->partie_parsing_state = $partie_caract[1];
$this->parties[ $this->partie_parsing_state ] = array();
}
else
$this->parties[ $this->partie_parsing_state ][$partie_caract[0]] = $partie_caract[1];
}
}
/* a little test */
$parser = new Parser();
$parser->parse('test.txt');
echo "======== Fight report - ".$parser->date." ==========\n\n";
echo "Victim :\n\n";
print_r($parser->victim);
echo "Parties :\n\n";
print_r($parser->parties);
echo "Items: \n\n";
print_r($parser->items);
?>
我们可以这样做,因为在这里,可靠性和性能不是问题: - )
快乐游戏!
答案 2 :(得分:1)
您可能对http://pear.php.net/package/PHP_LexerGenerator
感兴趣(是的,它是alpha。是的,我自己没有使用它。是的,你需要知道/学习lexer语法。为什么我建议?只是好奇你的体验是什么;-))