您好我需要拆分一些字符串,但是我对正则表达式或php数组不是很有经验。 所以字符串是这样的:
use strict;
use warnings;
use utf8;
use File::BOM;
use feature 'say';
my $UTF;
my $data = "Hello, héhé, 中文.\nsecond line : my 2€"; # 中文 = zhong wen = chinese
# UTF16 BE + BOM but incorrect CRLF: "0D 0A 00" instead of "0D 00 0A 00"
open $UTF, ">:encoding(UTF-16)", "utf-16-std-be.txt" or die $!;
say $UTF $data;
close $UTF;
# same as UTF-16BE (no BOM, incorrect CRLF)
open $UTF, ">:encoding(ucs2)", "utf-ucs2.txt" or die $!;
say $UTF $data;
close $UTF;
# UTF16 BE, no BOM, incorrect CRLF
open $UTF, ">:encoding(UTF-16BE)", "utf-16-be-nobom.txt" or die $!;
say $UTF $data;
close $UTF;
# UTF16 LE, no BOM, incorrect CRLF
open $UTF, ">:encoding(UTF-16LE)", "utf-16-le-nobom-wrongcrlf.txt" or die $!;
say $UTF $data;
close $UTF;
# UTF16 LE, BOM OK but still incorrect CRLF
open $UTF, ">:encoding(UTF-16LE):via(File::BOM)", "utf-16-le-bom-wrongcrlf.txt" or die $!;
say $UTF $data;
close $UTF;
# UTF16 LE non raw incorrect
# (crlf by default on windows) -> 0A => 0D 0A
open $UTF, ">:encoding(UTF-16LE):via(File::BOM)", "utf-16-le-bom-wrongcrlf2.txt" or die $!;
print $UTF $data, "\x0a"; # 0A is magically expanded to 0D 0A but wrong
close $UTF;
# UTF16 LE + BOM + LF
# raw -> 0A => 0A
# could be correct on UNIX but I need CRLF
open $UTF, ">raw::encoding(UTF-16LE):via(File::BOM)", "utf-16-le-bom-wrongcrlf3.txt" or die $!;
say $UTF $data;
close $UTF;
# manual BOM, but CRLF OK
open $UTF, ">:raw:encoding(UTF-16LE):crlf", "utf-16-le-bommanual-crlfok.txt" or die $!;
print $UTF "\x{FEFF}";
say $UTF $data;
close $UTF;
#auto BOM, CRLF OK ?
#incorrect, says utf8 "\xA9" does not map to Unicode at c:/perl/Dwimperl-5.14/perl/lib/Encode.pm line 176.
# But I cannot see where the A9 comes from ??!
#~ open $UTF, ">:raw:encoding(UTF-16LE):via(File::BOM):crlf", "utf-16-le-autobom-crlfok1.txt" or die $!;
#~ print $UTF $data;
#~ say $UTF $data;
#~ close $UTF;
# WTF? \n becomes 0D 00 0D 0A 00
open $UTF, ">:encoding(UTF-16LE):crlf:via(File::BOM)", "utf-16-le-autobom-crlf2.txt" or die $!;
say $UTF $data;
close $UTF;
#CORRECT WAY?? : Automatic BOM, CRLF is OK
open $UTF, ">:raw:encoding(UTF-16LE):crlf:via(File::BOM)", "utf-16-le-autobom-crlfok3.txt" or die $!;
say $UTF $data;
close $UTF;
我需要在字符串中将其转换为数组:
A N K U N F T 11.08.15
*** N ***
11.08.15 xxx xxx X3 2830 14:25 17:50
18.08.15 xxx xxx X3 2830 18:40 F882129 dsdsaidsaia F882129 xxxyxyagydaysd
我在regex101上做了以下事情:
for fnr:
date1 -> 11.08.15
date2-> 18.08.15
fnr1 -> X3 2830
h1 - > 17:50
fnr2 -> X3 2830
h2 -> 18:40
n1 -> dsdsaidsaia
n2 -> xxxyxyagydaysd
日期:
(\w{2}\s\d{4})
表示h:
(\n\s\d{2}\W\d{2}\W\d{2})
但是我不知道如何将date1从date2,fnr1与fnr2和h1中分离出来。
我在PHP中尝试了这个日期,并没有输出我想要的日期:
(\s{2}\d{2}\:\d{2}\n)
有人能帮帮我吗?提前谢谢!
答案 0 :(得分:0)
这将完全符合您的要求:
^((?:\d{2}\.?){3}).*?(\w{2}\s\d{4}).*?(\d{2}:\d{2})(?:.*?(\b[a-z]+\b).*?(\b[a-z]+\b))?$
它将每行的所有内容分成不同的捕获组。如果您有问题,请告诉我。
注意:请务必打开gm
标记,以便^
和$
使每行开始和结束;不是整个字符串。