从 $ codes 的给定字符串我只想拥有语言数组的所有语言,所有代码到代码数组,最后所有的系列到系列数组,我怎样才能在php中执行此操作?我已经尝试过使用dom,但是其他任何方式都不可能会受到赞赏,在此先感谢。
<?php
$codes = '<pre>
LANGUAGE CODE LANGUAGE FAMILY
AFAR AA HAMITIC
ABKHAZIAN AB IBERO-CAUCASIAN
AFRIKAANS AF GERMANIC
AMHARIC AM SEMITIC
ARABIC AR SEMITIC
ASSAMESE AS INDIAN
AYMARA AY AMERINDIAN
AZERBAIJANI AZ TURKIC/ALTAIC
BASHKIR BA TURKIC/ALTAIC
BYELORUSSIAN BE SLAVIC
BULGARIAN BG SLAVIC
BIHARI BH INDIAN
BISLAMA BI [not given]
BENGALI;BANGLA BN INDIAN
TIBETAN BO ASIAN
BRETON BR CELTIC
CATALAN CA ROMANCE
CORSICAN CO ROMANCE
CZECH CS SLAVIC
WELSH CY CELTIC
DANISH DA GERMANIC
GERMAN DE GERMANIC
BHUTANI DZ ASIAN
GREEK EL LATIN/GREEK
ENGLISH EN GERMANIC
ESPERANTO EO INTERNATIONAL AUX.
SPANISH ES ROMANCE
ESTONIAN ET FINNO-UGRIC
BASQUE EU BASQUE
PERSIAN (farsi) FA IRANIAN
FINNISH FI FINNO-UGRIC
FIJI FJ OCEANIC/INDONESIAN
FAROESE FO GERMANIC
FRENCH FR ROMANCE
FRISIAN FY GERMANIC
IRISH GA CELTIC
SCOTS GAELIC GD CELTIC
GALICIAN GL ROMANCE
GUARANI GN AMERINDIAN
GUJARATI GU INDIAN
HAUSA HA NEGRO-AFRICAN
HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW]
HINDI HI INDIAN
CROATIAN HR SLAVIC
HUNGARIAN HU FINNO-UGRIC
ARMENIAN HY INDO-EUROPEAN (OTHER)
INTERLINGUA IA INTERNATIONAL AUX.
INTERLINGUE IE INTERNATIONAL AUX.
INUPIAK IK ESKIMO
INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN]
ICELANDIC IS GERMANIC
ITALIAN IT ROMANCE
INUKTITUT IU [ ]
JAPANESE JA ASIAN
JAVANESE JV OCEANIC/INDONESIAN
GEORGIAN KA IBERO-CAUCASIAN
KAZAKH KK TURKIC/ALTAIC
GREENLANDIC KL ESKIMO
CAMBODIAN KM ASIAN
KANNADA KN DRAVIDIAN
KOREAN KO ASIAN
KASHMIRI KS INDIAN
KURDISH KU IRANIAN
KIRGHIZ KY TURKIC/ALTAIC
LATIN LA LATIN/GREEK
LINGALA LN NEGRO-AFRICAN
LAOTHIAN LO ASIAN
LITHUANIAN LT BALTIC
LATVIAN;LETTISH LV BALTIC
MALAGASY MG OCEANIC/INDONESIAN
MAORI MI OCEANIC/INDONESIAN
MACEDONIAN MK SLAVIC
MALAYALAM ML DRAVIDIAN
MONGOLIAN MN [not given]
MOLDAVIAN MO ROMANCE
MARATHI MR INDIAN
MALAY MS OCEANIC/INDONESIAN
MALTESE MT SEMITIC
BURMESE MY ASIAN
NAURU NA [not given]
NEPALI NE INDIAN
DUTCH NL GERMANIC
NORWEGIAN NO GERMANIC
OCCITAN OC ROMANCE
AFAN (OROMO) OM HAMITIC
ORIYA OR INDIAN
PUNJABI PA INDIAN
POLISH PL SLAVIC
PASHTO;PUSHTO PS IRANIAN
PORTUGUESE PT ROMANCE
QUECHUA QU AMERINDIAN
RHAETO-ROMANCE RM ROMANCE
KURUNDI RN NEGRO-AFRICAN
ROMANIAN RO ROMANCE
RUSSIAN RU SLAVIC
KINYARWANDA RW NEGRO-AFRICAN
SANSKRIT SA INDIAN
SINDHI SD INDIAN
SANGHO SG NEGRO-AFRICAN
SERBO-CROATIAN SH SLAVIC
SINGHALESE SI INDIAN
SLOVAK SK SLAVIC
SLOVENIAN SL SLAVIC
SAMOAN SM OCEANIC/INDONESIAN
SHONA SN NEGRO-AFRICAN
SOMALI SO HAMITIC
ALBANIAN SQ INDO-EUROPEAN (OTHER)
SERBIAN SR SLAVIC
SISWATI SS NEGRO-AFRICAN
SESOTHO ST NEGRO-AFRICAN
SUNDANESE SU OCEANIC/INDONESIAN
SWEDISH SV GERMANIC
SWAHILI SW NEGRO-AFRICAN
TAMIL TA DRAVIDIAN
TELUGU TE DRAVIDIAN
TAJIK TG IRANIAN
THAI TH ASIAN
TIGRINYA TI SEMITIC
TURKMEN TK TURKIC/ALTAIC
TAGALOG TL OCEANIC/INDONESIAN
SETSWANA TN NEGRO-AFRICAN
TONGA TO OCEANIC/INDONESIAN
TURKISH TR TURKIC/ALTAIC
TSONGA TS NEGRO-AFRICAN
TATAR TT TURKIC/ALTAIC
TWI TW NEGRO-AFRICAN
UIGUR UG [ ]
UKRAINIAN UK SLAVIC
URDU UR INDIAN
UZBEK UZ TURKIC/ALTAIC
VIETNAMESE VI ASIAN
VOLAPUK VO INTERNATIONAL AUX.
WOLOF WO NEGRO-AFRICAN
XHOSA XH NEGRO-AFRICAN
YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI]
YORUBA YO NEGRO-AFRICAN
ZHUANG ZA [ ]
CHINESE ZH ASIAN
ZULU ZU NEGRO-AFRICAN
</pre>';
$doc= new DOMDocument();
$doc->loadHTML($codes);
$xmlL = simplexml_import_dom($doc);
$pathL = $xmlL->xpath('//pre');
print_r($pathL);
?>
答案 0 :(得分:1)
我认为你应该看一下php的爆炸功能。
使用它你可以先用“\ n”字符分割(分隔线),然后得到第一个数组。 然后,对于每一行,您可以通过\ t进行爆炸(假设您有分隔数据的选项卡),以获得包含3个单独条目的数组,然后将所有这些数组推送到所需的数组中。
类似的东西:
$codes_array = array();
foreach($line as explode("\n",$codes) ){
$codes_array[] = explode("\t",$line);
}
答案 1 :(得分:1)
$langs_ar = array();
$codes_ar = array();
$families_ar = array();
foreach(preg_split('/[\r\n]+/', $codes) as $line)
{
if (preg_match('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/', $line, $matches))
{
$langs_ar[] = $matches[1];
$codes_ar[] = $matches[2];
$families_ar[] = $matches[3];
}
}
哦,而不是3个数组,我建议一个数组存储3个字段的哈希值;或使用3个属性lang,code和family创建自己的对象。
编辑:执行相同操作的更短的方法是:
preg_match_all('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/m', $codes, $matches, PREG_SET_ORDER);
var_dump($matches);
$ matches现在是索引所有行的“对象”数组:
只是迭代它来做你想做的事。