维基百科信息框 - 麻烦匹配模式

时间:2014-02-15 11:11:37

标签: php regex wiki

嗯,这是信息框内容的一个示例:

|conventional_long_name = Italian Republic
|native_name = {{lang|it|''Repubblica italiana<!--italiana is without uppercase; see Italian wiki-->''}}
|common_name = Italy
|nickname(s) = Il Belpaese
|image_flag = Flag of Italy.svg
|image_coat = Italy-Emblem.svg
|symbol_type = Emblem
|image_map = EU-Italy.svg
|map_caption = {{map caption |location_color=dark green |region=Europe |region_color=dark grey |subregion=the [[European Union]] |subregion_color=green |legend=EU-Italy.svg}}
|national_anthem = {{native name|it|[[Il Canto degli Italiani]]}}<br/>{{small|''The Song of the Italians''}} [[File:Inno di Mameli instrumental.ogg|center]]
|official_languages = [[Italian language|Italian]]<sup>a</sup>
|Religion= [[Roman Catholic]]
|capital = {{Coat of arms|Rome}}
|latd=41 |latm=54 |latNS=N |longd=12 |longm=29 |longEW=E
|largest_city = capital
|largest_metropolitan area = {{hlist |[[Milan]] |[[Naples]]}}
|demonym = [[Italians|Italian]]
|government_type = [[Unitary state|Unitary]] [[parliamentary system|parliamentary]] [[constitutional republic]]
|leader_title1 = [[President of Italy|President]]
|leader_name1 = [[Giorgio Napolitano]]
|leader_title2 = [[Prime Minister of Italy|Prime Minister]]
|leader_name2 = [[Enrico Letta]]
|leader_title3 = [[List of Presidents of the Senate of Italy|President of the Senate]]
|leader_name3 = [[Pietro Grasso]]
|leader_title4 = [[List of Presidents of the Italian Chamber of Deputies|President of the Chamber of Deputies]]
|leader_name4 = [[Laura Boldrini]]
|legislature = [[Parliament of Italy|Parliament]]
|upper_house = [[Italian Senate|Senate of the Republic]]
|lower_house = [[Italian Chamber of Deputies|Chamber of Deputies]]
|accessionEUdate = 25 March 1957 (founding member)
|EUseats = 78
|area_rank = 72nd
|area_magnitude = 1 E11
|area_km2 = 301,338
|area_sq_mi = 116,347 <!--Do not remove per [[WP:MOSNUM]]-->
|percent_water = 2.4
|population_census = 59,433,744<ref name="Istat">{{cite web |url=http://www.istat.it/it/files/2012/12/volume_popolazione-legale_XV_censimento_popolazione.pdf|title=Census 2011 - final results |publisher=[[National Institute of Statistics (Italy)|ISTAT]] |accessdate=19 December 2012}}</ref>
|population_census_year = 2011
|population_census_rank = 23rd
|population_estimate = 59,685,227<ref>{{cite web |url=http://www.istat.it/en/archive/94537|title=Resident population and population change|publisher=[[National Institute of Statistics (Italy)|ISTAT]] |accessdate=25 June 2013}}</ref>
|population_estimate_year = 2012
|population_estimate_rank = 23rd
|population_density_rank = 63rd
|population_density_km2 = 197.7
|population_density_sq_mi = 511.6 <!--Do not remove per [[WP:MOSNUM]]-->
|GDP_PPP = $1.848 trillion<ref name=autogenerated1 >{{cite web |url=http://www.imf.org/external/pubs/ft/weo/2013/02/weodata/weorept.aspx?pr.x=25&pr.y=1&sy=2013&ey=2013&scsm=1&ssd=1&sort=country&ds=.&br=1&c=136&s=NGDPD%2CNGDPDPC%2CPPPGDP%2CPPPPC&grp=0&a= |title=Italy |publisher=International Monetary Fund |accessdate=17 October 2013}}</ref>
|GDP_PPP_rank = 11th
|GDP_PPP_year = 2014
|GDP_PPP_per_capita = $30,218<ref name=autogenerated1/>
|GDP_PPP_per_capita_rank = 34th
|GDP_nominal = $2.148 trillion<ref name=autogenerated1/>
|GDP_nominal_rank = 9th
|GDP_nominal_year = 2014
|GDP_nominal_per_capita = $35,123<ref name=autogenerated1/>
|GDP_nominal_per_capita_rank = 27th
|sovereignty_type = [[History of Italy|Formation]]
|established_event1 = [[Italian unification|Unification]]
|established_date1 = 17 March 1861
|established_event2 = [[Italian constitutional referendum, 1946|Republic]]
|established_date2 = 2 June 1946
|Gini_year = 2011
|Gini_change =  <!--increase/decrease/steady-->
|Gini = 31.9 <!--number only-->
|Gini_ref = <ref name=eurogini>{{cite web|title=Gini coefficient of equivalised disposable income (source: SILC)|url=http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=ilc_di12|publisher=Eurostat Data Explorer|accessdate=13 August 2013}}</ref>
|Gini_rank =
|HDI_year = 2013
|HDI_change = increase <!--increase/decrease/steady-->
|HDI = 0.881 <!--number only-->
|HDI_ref = <ref name="HDI">{{cite web |url=http://hdr.undp.org/en/media/HDR_2011_EN_Table1.pdf |title=Human Development Report 2011 |year=2011 |publisher=United Nations |accessdate=5 November 2011}}</ref>
|HDI_rank = 25th
|currency = Euro ([[Euro sign|€]])<sup>b</sup>
|currency_code = EUR
|country_code =
|time_zone = [[Central European Time|CET]]
|utc_offset = +1
|time_zone_DST = [[Central European Summer Time|CEST]]
|utc_offset_DST = +2
|drives_on = right
|calling_code = [[Telephone numbers in Italy|39]]<sup>c</sup>
|cctld = [[.it]]<sup>d</sup>
|footnote_a = <span style="font-size:100%;">French is co-official in the [[Aosta Valley]]; [[Slovene language|Slovene]] is co-official in the [[province of Trieste]] and the [[province of Gorizia]]; German and [[Ladin language|Ladin]] are co-official in [[South Tyrol]].</span>

|footnote_b = <span style="font-size:100%;">Before 2002, the [[Italian lira|Italian Lira]]. The euro is accepted in [[Campione d'Italia]], but the official currency there is the [[Swiss Franc]].<ref>{{cite web |url=http://www.comune.campione-d-italia.co.it/ |title=Comune di Campione d'Italia |publisher=Comune.campione-d-italia.co.it |date=14 July 2010 |accessdate=30 October 2010}}</ref></span>
|footnote_c = <span style="font-size:100%;">To call [[Campione d'Italia]], it is necessary to use the Swiss code [[+41]].</span>
|footnote_d = <span style="font-size:100%;">The [[.eu]] domain is also used, as it is shared with other [[European Union]] member states.</span>

我希望匹配所有键/值对。喜欢:

"conventional_long_name" => "Italian Republic"

我正在使用的正则表达式是:

preg_match_all("/\|(?:\s+)?([^=]+)(?:\s+)?=(?:\s+)?(.+)/", $line, $matches);

哪个工作正常,但它无法解析这一行:

|latd=41 |latm=54 |latNS=N |longd=12 |longm=29 |longEW=E

匹配latd41 |latm=54 |latNS=N |longd=12 |longm=29 |longEW=E

我做错了什么?有什么想法吗?

1 个答案:

答案 0 :(得分:4)

你可以使用这样的东西吗?

\|\s*([^=]+?)\s*=\s*((?:<[^<>]*>|\[\[(?:(?!\]\]).)*\]\]|{{(?:(?!}}).)*}}|[^|{}\[\]<>]+)+)

regex101 demo

如果它们在|[[ ... ]]< ... >范围内,它基本上会抓取所有内容(甚至是{{ ... }}),但如果它们与单独的|不符,它不在其中一个括号内。