解析结构化文本数据

时间:2015-05-14 14:32:36

标签: text-parsing

我以文本格式从mysql表中提取了blob字段:

CAST(orders AS CHAR(10000) CHARACTER SET utf8)

现在每个字段都是这样的:

a:2:{s:4:"Cart";a:5:{s:4:"cart";a:2:{i:398;a:7:{s:2:"id";s:3:"398";s:4:"name";s:14:"Some product 1";s:5:"price";i:780;s:3:"uid";s:5:"FN-02";s:3:"num";s:1:"1";s:6:"weight";s:1:"0";s:4:"user";s:1:"4";}i:379;a:7:{s:2:"id";s:3:"379";s:4:"name";s:14:"Some product 2";s:5:"price";i:750;s:3:"uid";s:5:"FR-01";s:3:"num";s:1:"1";s:6:"weight";s:1:"0";s:4:"user";s:1:"4";}}s:3:"num";i:2;s:3:"sum";s:7:"1530.00";s:6:"weight";i:160;s:8:"dostavka";s:3:"180";}s:6:"Person";a:17:{s:4:"ouid";s:6:"103-47";s:4:"data";s:10:"1278090513";s:4:"time";s:8:"21:33 pm";s:4:"mail";s:15:"mail@mailer.com";s:11:"name_person";s:8:"John Doe";s:8:"org_name";s:13:"John Doe Inc.";s:7:"org_inn";s:12:"667110804509";s:7:"org_kpp";s:0:"";s:8:"tel_code";s:3:"343";s:8:"tel_name";s:7:"2670039";s:8:"adr_name";s:26:"London, 221b, Baker street";s:14:"dostavka_metod";s:1:"8";s:8:"discount";s:0:"";s:7:"user_id";s:2:"13";s:6:"dos_ot";s:0:"";s:6:"dos_do";s:0:"";s:11:"order_metod";s:1:"1";}}

我可以注意到的是,此文字按顺序排列:[type]:[length]:[data];,其中[type] s 代表字符串 a 代表数组(或Python中的字典)。它还有i:'number':个组,没有[length]:

我没有看到比使用正则表达式在几次传递中解析它更好的解决方案,尽管我不清楚如何解析嵌套字典(用Python术语)。

问题:它是一个已经有解析器的标准数据结构吗?

1 个答案:

答案 0 :(得分:1)

这看起来像PHP序列化函数的输出(你需要反序列化它):

http://php.net/manual/en/function.serialize.php

如果你在python中工作,那里有一个serialize和unserialize函数的端口:

https://pypi.python.org/pypi/phpserialize

Anatomy of a serialize()'ed value:

String
s:size:value;

Integer
i:value;

Boolean
b:value; (does not store "true" or "false", does store '1' or '0')

Null
N;

Array
a:size:{key definition;value definition;(repeated per element)}

Object
O:strlen(object name):object name:object size:{s:strlen(property name):property name:property definition;(repeated per property)}

String values are always in double quotes
Array keys are always integers or strings
    "null => 'value'" equates to 's:0:"";s:5:"value";',
    "true => 'value'" equates to 'i:1;s:5:"value";',
    "false => 'value'" equates to 'i:0;s:5:"value";',
    "array(whatever the contents) => 'value'" equates to an "illegal offset type" warning because you can't use an
    array as a key; however, if you use a variable containing an array as a key, it will equate to 's:5:"Array";s:5:"value";',
     and
    attempting to use an object as a key will result in the same behavior as using an array will.