PHP - 工作RegEx是......不工作?

时间:2012-05-30 18:09:16

标签: php regex arrays

我在第6行有一个正则表达式。我测试了它,并确认它有效。现在,我有一个非常大的数组。虽然我已经在较小的数组上测试了正则表达式,但它似乎不适用于较大的数组!是什么赋予了?基本上,下面的脚本旨在删除包含不常见字符的数组条目。看看当你尝试两个数组时会发生什么 - 第一个成功地删除了奇数数组条目,但是较大的数组没有!

见这里: http://pastebin.com/raw.php?i=KpNZHbrv

$sentences_working = array(
    'Egypt, officially the Arab Republic of Egypt, is a country mainly in North Africa, with the Sinai Peninsula forming a land bridge in Southwest Asia.',
    'Egypt is thus a transcontinental country, and a major power in Africa, the Mediterranean Basin, the Middle East and the Muslim world.',
    'The English name Egypt was borrowed from Middle French Egypte, from Latin, from ancient Greek Aígyptos, from earlier Linear B  a-ku-pi-ti-yo.',
    'In later years, the dynasty became a British puppet.',
);

$sentences_notworking = unserialize(urldecode("a%3A30%3A%7Bi%3A0%3Bs%3A160%3A%22Egypt++%2C+officially+the+Arab+Republic+of+Egypt%2C+Arabic%3A+%2C+is+a+country+mainly+in+North+Africa%2C+with+the+Sinai+Peninsula+forming+a+land+bridge+in+Southwest+Asia.%22%3Bi%3A1%3Bs%3A133%3A%22Egypt+is+thus+a+transcontinental+country%2C+and+a+major+power+in+Africa%2C+the+Mediterranean+Basin%2C+the+Middle+East+and+the+Muslim+world.%22%3Bi%3A2%3Bs%3A223%3A%22Covering+an+area+of+about+1%2C010%2C000+square+kilometers+%2C+Egypt+is+bordered+by+the+Mediterranean+Sea+to+the+north%2C+the+Gaza+Strip+and+Israel+to+the+northeast%2C+the+Red+Sea+to+the+east%2C+Sudan+to+the+south+and+Libya+to+the+west.%22%3Bi%3A3%3Bs%3A74%3A%22Egypt+is+one+of+the+most+populous+countries+in+Africa+and+the+Middle+East.%22%3Bi%3A4%3Bs%3A171%3A%22The+great+majority+of+its+over+81+million+people+live+near+the+banks+of+the+Nile+River%2C+in+an+area+of+about+40%2C000+square+kilometers+%2C+where+the+only+arable+land+is+found.%22%3Bi%3A5%3Bs%3A60%3A%22The+large+areas+of+the+Sahara+Desert+are+sparsely+inhabited.%22%3Bi%3A6%3Bs%3A177%3A%22About+half+of+Egypt%27s+residents+live+in+urban+areas%2C+with+most+spread+across+the+densely+populated+centres+of+greater+Cairo%2C+Alexandria+and+other+major+cities+in+the+Nile+Delta.%22%3Bi%3A7%3Bs%3A118%3A%22Monuments+in+Egypt+such+as+the+Giza+pyramid+complex+and+its+Great+Sphinx+were+constructed+by+its+ancient+civilization.%22%3Bi%3A8%3Bs%3A155%3A%22Its+ancient+ruins%2C+such+as+those+of+Memphis%2C+Thebes%2C+and+Karnak+and+the+Valley+of+the+Kings+outside+Luxor%2C+are+a+significant+focus+of+archaeological+study.%22%3Bi%3A9%3Bs%3A83%3A%22The+tourism+industry+and+the+Red+Sea+Riviera+employ+about+12%25+of+Egypt%27s+workforce.%22%3Bi%3A10%3Bs%3A170%3A%22The+economy+of+Egypt+is+one+of+the+most+diversified+in+the+Middle+East%2C+with+sectors+such+as+tourism%2C+agriculture%2C+industry+and+service+at+almost+equal+production+levels.%22%3Bi%3A11%3Bs%3A133%3A%22In+early+2011%2C+Egypt+underwent+a+revolution%2C+which+resulted+in+the+ousting+of+President+Hosni+Mubarak+after+nearly+30+years+in+power.%22%3Bi%3A12%3Bs%3A50%3A%22Presidential+elections+are+scheduled+for+May+2012.%22%3Bi%3A13%3Bs%3A190%3A%22The+English+name+Egypt+was+borrowed+from+Middle+French+Egypte%2C+from+Latin+%2C+from+ancient+Greek+A%26iacute%3Bgyptos+%2C+from+earlier+Linear+B+%26%2365601%3B%26%2365555%3B%26%2365568%3B%26%2365588%3B%26%2365549%3B+a-ku-pi-ti-yo.%22%3Bi%3A14%3Bs%3A277%3A%22The+adjective+aig%26yacute%3Bpti-%2C+aig%26yacute%3Bptios+was+borrowed+into+Coptic+as+%26%2311397%3B%26%2311433%3B%26%2311425%3B%26%231007%3B%26%2311411%3B%26%2311423%3B%26%2311429%3B%2F%26%2311413%3B%26%2311433%3B%26%2311425%3B%26%231007%3B%26%2311411%3B%26%2311423%3B%26%2311429%3B+gyptios%2C+kyptios%2C+and+from+there+into+Arabic+as+%2C+back+formed+into+%2C+whence+English+Copt.%22%3Bi%3A15%3Bs%3A209%3A%22The+Greek+forms+were+borrowed+from+Late+Egyptian++Hikuptah+%22Memphis%22%2C+a+corruption+of+the+earlier+Egyptian+name+Hwt-ka-Ptah+%2C+meaning+%22home+of+the+ka++of+Ptah%22%2C+the+name+of+a+temple+to+the+god+Ptah+at+Memphis.%22%3Bi%3A16%3Bs%3A130%3A%22Strabo+attributed+the+word+to+a+folk+etymology+in+which+A%26iacute%3Bgyptos++evolved+as+a+compound+from++%2C+meaning+%22below+the+Aegean%22.%22%3Bi%3A17%3Bs%3A288%3A%22%2C+the+Arabic+and+modern+official+name+of+Egypt+%2C+is+of+Semitic+origin%2C+directly+cognate+with+other+Semitic+words+for+Egypt+such+as+the+Hebrew+%26lrm%3B+%2C+literally+meaning+%22the+two+straits%22+.+The+word+originally+connoted+%22metropolis%22+or+%22civilization%22+and+means+%22country%22%2C+or+%22frontier-land%22.%22%3Bi%3A18%3Bs%3A232%3A%22The+ancient+Egyptian+name+of+the+country+is+Kemet++%5B%26%2378222%3B%26%2378163%3B%26%2378799%3B%26%2378486%3B%5D%2C+which+means+%22black+land%22%2C+referring+to+the+fertile+black+soils+of+the+Nile+flood+plains%2C+distinct+from+the+deshret+%2C+or+%22red+land%22+of+the+desert.%22%3Bi%3A19%3Bs%3A152%3A%22The+name+is+realized+as++and++in+the+Coptic+stage+of+the+Egyptian+language%2C+and+appeared+in+early+Greek+as++.+Another+name+was++%22land+of+the+riverbank%22.%22%3Bi%3A20%3Bs%3A105%3A%22The+names+of+Upper+and+Lower+Egypt+were+Ta-Sheme%27aw++%22sedgeland%22+and+Ta-Mehew++%22northland%22%2C+respectively.%22%3Bi%3A21%3Bs%3A79%3A%22There+is+evidence+of+rock+carvings+along+the+Nile+terraces+and+in+desert+oases.%22%3Bi%3A22%3Bs%3A103%3A%22In+the+10th+millennium+BC%2C+a+culture+of+hunter-gatherers+and+fishers+replaced+a+grain-grinding+culture.%22%3Bi%3A23%3Bs%3A117%3A%22Climate+changes+and%2For+overgrazing+around+8000+BC+began+to+desiccate+the+pastoral+lands+of+Egypt%2C+forming+the+Sahara.%22%3Bi%3A24%3Bs%3A129%3A%22Early+tribal+peoples+migrated+to+the+Nile+River+where+they+developed+a+settled+agricultural+economy+and+more+centralized+society.%22%3Bi%3A25%3Bs%3A63%3A%22By+about+6000+BC+a+Neolithic+culture+rooted+in+the+Nile+Valley.%22%3Bi%3A26%3Bs%3A104%3A%22During+the+Neolithic+era%2C+several+predynastic+cultures+developed+independently+in+Upper+and+Lower+Egypt.%22%3Bi%3A27%3Bs%3A108%3A%22The+Badarian+culture+and+the+successor+Naqada+series+are+generally+regarded+as+precursors+to+dynastic+Egypt.%22%3Bi%3A28%3Bs%3A100%3A%22The+earliest+known+Lower+Egyptian+site%2C+Merimda%2C+predates+the+Badarian+by+about+seven+hundred+years.%22%3Bi%3A29%3Bs%3A198%3A%22Contemporaneous+Lower+Egyptian+communities+coexisted+with+their+southern+counterparts+for+more+than+two+thousand+years%2C+remaining+culturally+distinct%2C+but+maintaining+frequent+contact+through+trade.%22%3B%7D"));

$sentences = $sentences_working; // change here for testing
foreach ($sentences as $sentence_key => $sentence)
{
    if (preg_match('/[^\x20-\x7E]/', $sentence))
    {
        unset($sentences[$sentence_key]);
    }
}

echo "<pre>";
print_r($sentences);
echo "</pre>";

要测试两个阵列(较小和较大),只需更改第12行。发生了什么?

1 个答案:

答案 0 :(得分:0)

你提供的长字符串不仅是urlencoded,php序列化的,还有它的实体编码。

html_entity_decode应该修复它,但我还想一想三重编码是否真的是你想要的。

修改:刚刚看到您的评论:

  

我感觉字符代码出于某种原因在允许的范围内。

这里有一些事情......很容易找到..只做一个hexdump。但是,如果你在终端中运行它,你会发现它甚至更早。如果你要查看编码相关的PHP脚本的输出.. 至少检查'查看源'而不是浏览器屏幕。