在Perl中解析Python的序列化对象

时间:2011-05-28 18:11:12

标签: python perl parsing

我需要让我的Perl代码读取一些Python的序列化对象,以便以后处理。我带来了基于Parse :: MGC的解析器,但它很慢。可能是我做错了,或者有人知道在某种Perl结构中转换Python序列化对象的更好方法吗?

这是我的Parse代码:

package Room::HandParser;
use base qw( Parser::MGC );

my @poker_cards_string = ( '2h', '3h', '4h', '5h', '6h', '7h', '8h', '9h', 'Th', 'Jh', 'Qh', 'Kh', 'Ah', '2d', '3d', '4d', '5d', '6d', '7d', '8d', '9d', 'Td', 'Jd', 'Qd', 'Kd', 'Ax', '2c', '3c', '4c', '5c', '6c', '7c', '8c', '9c', 'Tc', 'Jc', 'Qc', 'Kc', 'Ac', '2s', '3s', '4s', '5s', '6s', '7s', '8s', '9s', 'Ts', 'Js', 'Qs', 'Ks', 'As' );

sub parse_declaration {
  my $self = shift;

  [  
    $self->any_of(
      sub { $self->token_int },
      sub { $self->token_string },
    ),
    $self->expect(":"),
    $self->parse,
  ]
}

sub parse_hash {
  my $self = shift;

  my %ret;
  $self->list_of(",", sub {
      my $res = $self->parse_declaration;
      $ret{$res->[0]} = $res->[2];
  });

  return \%ret;
}


sub parse_cards {
  my $self = shift;
  my $card = $self->token_int;
  return $poker_cards_string[$card & 0x3F];
}

sub parse {
  my $self = shift;

  $self->any_of(
    sub { $self->scope_of( "[", sub { $self->list_of(",", \&parse) }, "]" ) },
    sub { $self->scope_of( "(", sub { $self->list_of(",", \&parse) }, ")" ) },
    sub { $self->scope_of( "{", sub { $self->parse_hash }, "}" ) },
    sub { $self->scope_of( "PokerCards([", sub { $self->list_of(",", \&parse_cards) }, "])" ) },
    sub { $self->token_float },
    sub { $self->token_int },
    sub { $self->token_string },
    sub { $self->token_kw( qw(None True False) ) },
  );
}

1;

以下是我需要解析的序列化Python对象的示例:

[('game', 0, 195, 0, 0.0, 'holdem', '100-200-no-limit', [50312, 50313, 50314, 50315, 50316, 50317, 2], 0, {2: 1000000, 50312: 200000, 50313: 200000, 50314: 200000, 50315: 200000, 50316: 200000, 50317: 200000}), ('position', 1), ('blind', 50313, 10000, 0), ('position', 2), ('blind', 50314, 20000, 0), ('position', -1), ('round', 'pre-flop', PokerCards([]), {2: PokerCards([226, 208]), 50312: PokerCards([223, 206]), 50313: PokerCards([221, 233]), 50314: PokerCards([222, 211]), 50315: PokerCards([235, 216]), 50316: PokerCards([209, 236]), 50317: PokerCards([237, 243])}), ('position', 3), ('call', 50315, 20000), ('position', 4), ('call', 50316, 20000), ('position', 5), ('call', 50317, 20000), ('position', 6), ('call', 2, 20000), ('position', 0), ('fold', 50312), ('position', 1), ('call', 50313, 10000), ('position', 2), ('check', 50314), ('position', -1), ('round', 'flop', PokerCards([7, 21, 46]), {2: PokerCards([226, 208]), 50313: PokerCards([221, 233]), 50314: PokerCards([222, 211]), 50315: PokerCards([235, 216]), 50316: PokerCards([209, 236]), 50317: PokerCards([237, 243])}), ('position', 1), ('check', 50313), ('position', 2), ('check', 50314), ('position', 3), ('check', 50315), ('position', 4), ('check', 50316), ('position', 5), ('check', 50317), ('position', 6), ('check', 2), ('position', -1), ('round', 'turn', PokerCards([7, 21, 46, 38]), None), ('position', 1), ('check', 50313), ('position', 2), ('check', 50314), ('position', 3), ('check', 50315), ('position', 4), ('check', 50316), ('position', 5), ('check', 50317), ('position', 6), ('check', 2), ('position', -1), ('round', 'river', PokerCards([7, 21, 46, 38, 20]), None), ('position', 1), ('check', 50313), ('position', 2), ('check', 50314), ('position', 3), ('check', 50315), ('position', 4), ('check', 50316), ('position', 5), ('check', 50317), ('position', 6), ('check', 2), ('position', -1), ('showdown', None, {2: PokerCards([226, 208]), 50313: PokerCards([29, 41]), 50314: PokerCards([222, 211]), 50315: PokerCards([43, 24]), 50316: PokerCards([209, 236]), 50317: PokerCards([45, 51])}), ('end', [50317], [{'serial2delta': {2: -20000, 50313: -20000, 50314: -20000, 50315: -20000, 50316: -20000, 50317: 100000}, 'player_list': [50312, 50313, 50314, 50315, 50316, 50317, 2], 'serial2rake': {50317: 0}, 'serial2share': {50317: 120000}, 'pot': 120000, 'serial2best': {2: {'hi': [101154816, ['FlHouse', 46, 20, 7, 34, 21]]}, 50313: {'hi': [50841600, ['Trips', 46, 20, 7, 38, 21]]}, 50314: {'hi': [50841600, ['Trips', 46, 20, 7, 38, 21]]}, 50315: {'hi': [50842368, ['Trips', 46, 20, 7, 38, 24]]}, 50316: {'hi': [50841600, ['Trips', 46, 20, 7, 38, 21]]}, 50317: {'hi': [101171200, ['FlHouse', 46, 20, 7, 51, 38]]}}, 'type': 'game_state', 'side_pots': {'building': 0, 'pots': [[120000, 120000]], 'last_round': 3, 'contributions': {0: {0: {2: 20000, 50313: 20000, 50314: 20000, 50315: 20000, 50316: 20000, 50317: 20000}}, 1: {}, 2: {}, 'total': {2: 20000, 50313: 20000, 50314: 20000, 50315: 20000, 50316: 20000, 50317: 20000}, 3: {}}}}, {'serials': [50313, 50314, 50315, 50316, 50317, 2], 'pot': 120000, 'hi': [50317], 'chips_left': 0, 'type': 'resolve', 'serial2share': {50317: 120000}}])]

对于这样的结构,需要几秒钟和100%的CPU来解析这个对象,这在我的情况下是不可接受的。

编辑:在这里,我不是在寻找解决方法,例如编写python脚本来评估此结构并输出JSON,或者使用添加的函数重写原始Python应用程序以将数据存储为JSON。我正在研究使用Perl以合理的性能解析这些数据,因为这种格式非常接近JSON,应该可以在相似的时间内解析它。

2 个答案:

答案 0 :(得分:6)

你使用不同的格式怎么样?例如,JSON很容易解析,并且Perl中的实现应该可以开箱即用。 Python内置了JSON序列化和反序列化,因此您不必重新发明任何轮子。

答案 1 :(得分:0)

如果有兴趣的话:我最终得到的几个regexp将这个字符串转换为JSON(因为它们非常近看)然后用JSON :: XS解析它。 https://github.com/hippich/Bitcoin-Poker-Room/commit/2f0e089908d3fa71dc16021ac6a24807c46529ad#diff-1 __parse_hands()子程序。