在Perl中,如何从缩小的JavaScript源文件中提取某些字符串?

时间:2015-07-15 08:04:02

标签: regex perl

我有这个丑陋的文件。

  

{message:“每次鼠标在画布中移动时,它的作用是什么   区域,它将mouseX和mouseY设置为   鼠标。“,},{消息:”然后,当每个球都更新时,它会计算出来   离鼠标有多远,并朝着它加速   它。“,},{消息:”加速度是距离的平方根,   所以当它真的很远的时候会拉得更厉害。想象一下所有的球   通过小橡皮筋或弹簧连接到鼠标。它的   有点像。“,},{message:”试着让球更小!和   添加更多!我喜欢它,大约有40个小球在追逐着   鼠标。“,},{message:”干得好!就像你学到的一样?是吗   有趣吗?“,代码:”“,hiddenCode:”var c =   的document.getElementById( '窗格')的getContext( '2D'); \ nfunction   rgba(r,g,b,a){return'rgba('+ [r,g,b,a] .join(',')+')';} \ n \ nfunction   rgb(r,g,b,a){return   'RGB(' + [R,G,B]。加入( ' ')+')';} \ n \ n “个,lessonSection道:”   结束“,},{消息:”哇,你做了一切!恭喜,干得好!   其中很多都很难。我给你留下了深刻的印象!我希望   你喜欢它!“,代码:'var pane =   document.getElementById(\'pane \'); \ nvar s = 3; \ n \ npane.onmousemove =   function(evt){\ n c.fillStyle = randomRGBA(); \ n var x =   evt.clientX; \ n var y = evt.clientY; \ n c.fillRect(x - s / 2,y - s /   2,s,s);}; \ n \ n函数randomRGBA(){\ n var r = randInt(255); \ n var   g = randInt(255); \ n var b = randInt(255); \ n var a = Math.random(); \ n   var rgba = [r,g,b,a] .join(“,”); \ n返回“rgba(”+ rgba +   “)”; \ n} \ n \ n函数randInt(limit){\ n var x =

我正在尝试使用Perl正则表达式来提取消息正文

我尝试了两个3小时的工作,但我似乎无法提取它。

我的观点是将消息从英语翻译成其他语言,所以我希望将消息的字符串放在干净的文件上,而不是处理这个结合了消息和代码的丑陋文件。

我试图使用此代码:

use strict;
use warnings;

my $filename = 'test.txt';
my $row = '';

if (open(my $fh, '<:encoding(UTF-8)', $filename)) {
  while ($row = <$fh>) {
    if ($row =~/message:(.*)/)
    {
        print $1 . "\n";
    }
  }
} 
else {
  warn "Could not open file '$filename' $!";
}

它基本上将整个文件的结果作为输出。 我尝试了\W+\s+,它只给了我第一个字。

有什么想法吗?

3 个答案:

答案 0 :(得分:2)

问题是数据中没有换行符,因此.*匹配文件的其余部分。请尝试/message:"([^"]*)/,其中只匹配不是双引号的字符

我写了这个

use strict;
use warnings;
use 5.010;

my $data = do {
    local $/;
    <DATA>;
};

say "$1: $2" while $data =~ /[{,](\w+):"([^"]*)/g;

__DATA__
{message:"What this does is, every time the mouse moves in the canvas area, it sets mouseX and mouseY to the location of the mouse.",},{message:"Then, when each ball is updated, it figures out how far away from the mouse it is, and accelerates toward it.",},{message:"The acceleration is the square root of the distance, so it pulls harder when it is really far away. Imagine all the balls being connected to the mouse by little rubber bands or springs. It's a little like that.",},{message:"Try making the balls smaller! And add more of them! I like it with about 40 small balls chasing the mouse.",},{message:"Great job! Like what you learned? Was it fun?",code:"",hiddenCode:"var c = document.getElementById('pane').getContext('2d');\nfunction rgba(r,g,b,a) {return 'rgba('+[r,g,b,a].join(',')+')';}\nfunction rgb(r,g,b,a) {return 'rgb('+[r,g,b].join(',')+')';}\n\n",lessonSection:"The End",},{message:"Wow, you did everything! Congratulations, nice work! A lot of these are really hard. I'm impressed you finished! I hope you enjoyed it!",code:'var pane = document.getElementById(\'pane\');\nvar s = 3;\n\npane.onmousemove = function(evt) {\n c.fillStyle = randomRGBA();\n var x = evt.clientX;\n var y = evt.clientY;\n c.fillRect(x - s / 2, y - s / 2, s, s);};\n\nfunction randomRGBA() {\n var r = randInt(255);\n var g = randInt(255);\n var b = randInt(255);\n var a = Math.random();\n var rgba = [r,g,b,a].join(",");\n return "rgba(" + rgba + ")";\n}\nfunction randInt(limit) {\n var x =

产生了这个输出

message: What this does is, every time the mouse moves in the canvas area, it sets mouseX and mouseY to the location of the mouse.
message: Then, when each ball is updated, it figures out how far away from the mouse it is, and accelerates toward it.
message: The acceleration is the square root of the distance, so it pulls harder when it is really far away. Imagine all the balls being connected to the mouse by little rubber bands or springs. It's a little like that.
message: Try making the balls smaller! And add more of them! I like it with about 40 small balls chasing the mouse.
message: Great job! Like what you learned? Was it fun?
code: 
hiddenCode: var c = document.getElementById('pane').getContext('2d');\nfunction rgba(r,g,b,a) {return 'rgba('+[r,g,b,a].join(',')+')';}\nfunction rgb(r,g,b,a) {return 'rgb('+[r,g,b].join(',')+')';}\n\n
lessonSection: The End
message: Wow, you did everything! Congratulations, nice work! A lot of these are really hard. I'm impressed you finished! I hope you enjoyed it!

毫无疑问,语法,无论它是什么,都允许在每个字符串中嵌入双引号,但在这个片段中没有它的例子

答案 1 :(得分:0)

我不知道为什么你需要使用缩小和连接的源代码来执行此操作,但是,你可以改变它:

#!/usr/bin/env perl

use strict;
use warnings;

use Path::Class;
use JavaScript::Beautifier qw/js_beautify/;

my $js = file('combined.min.js')->slurp('<:encoding(UTF-8)');

my $pretty_js = js_beautify($js);

my @messages = ($pretty_js =~ /message: (.+?)\n/g);

print "$_\n" for @messages;

答案 2 :(得分:-1)

您已经有一些perl答案,但您可能也对此感兴趣 xgettext 工具,专门用于提取国际化字符串。像这样运行:

xgettext -a --from-code UTF-8 combined.min.js -o - 

它为您提供每个字符串的输出,如下所示:

#: combined.min.js:36
msgid ""
"Here is a ball that sticks to the mouse.  Every time the mouse moves, the "
"ball redraws on top of the mouse."
msgstr ""

它位于gnu gettext 包中。看gnu gettext