我正在尝试通过过滤掉符合以下条件的字词来匹配字符串中的关键字:
示例:
$string = "Joe O'Donnell and Oscar De La Hoya went to a Pittsburgh Steelers game on Sunday, where Joe lost his iPhone 5, so he borrowed Oscar's iPad";
preg_match_all("/[A-Z][a-z]*/",$string,$match_words); // incorrect expression
// desired result for $match_words should be:
// array(Joe ODonnell, Oscar De La Hoya, Pittsburgh Steelers, Sunday, Joe, iPhone 5, Oscars, iPad)
由于
答案 0 :(得分:3)
您可以使用这样的正则表达式:
\b((?:[A-Z]['a-z]*\s*\d*)+)\b|\b((?:[a-z]*[A-Z]['a-z]*\s*\d*)+)\b
<强> Working demo 强>
匹配信息:
MATCH 1
1. [0-14] `Joe O'Donnell `
MATCH 2
1. [18-35] `Oscar De La Hoya `
MATCH 3
1. [45-65] `Pittsburgh Steelers `
MATCH 4
1. [73-79] `Sunday`
MATCH 5
1. [87-91] `Joe `
MATCH 6
2. [100-108] `iPhone 5`
MATCH 7
1. [125-133] `Oscar's `
MATCH 8
2. [133-137] `iPad`
正则表达式由两种模式组成:
\b((?:[A-Z]['a-z]*\s*\d*)+)\b ---> Match words like Joe O'Connels or Oscar De La Hoya
|
\b((?:[a-z]*[A-Z]['a-z]*\s*\d*)+)\b ---> Match words like iPad or iPhone
顺便说一句,如果你看一下结果,它最后会有一个尾随空格,你可以对结果进行修剪以清理它。
答案 1 :(得分:3)
您可以先删除所有非字母数字字符:
$string2 = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);
然后使用preg_split
而不是preg_replace
来拆分字符串的完整小写字词序列。
$match_words = preg_split("/ ([a-z]| )+ /", $string2);
(如果您不介意$string
被销毁,可以将$string2
替换为$string
)
这适用于您提供的示例,但请考虑您希望程序使用较少的清理输入进行操作的方式。例如,"Foo Bar"
(两个空格)将被分成两个元素,而"Foo Bar"
(一个空格)将保持为一个。如果您不担心速度,可以使用另一个preg_replace
来用一个空格替换任何空格序列。
答案 2 :(得分:2)
您可以在此处使用PHP的ctype_lower功能!
<?php
$string = "Joe O'Donnell and Oscar De La Hoya went to a Pittsburgh Steelers game on Sunday, where Joe lost his iPhone 5, so he borrowed Oscar's iPad";
$words = $temp = array();
// Loop through the string after turning it into an array (by spaces)
foreach (explode(" ", $string) as $word) {
// Check if the word is lowercase and is not a number
if (ctype_lower($word) && !is_numeric($word)) {
if (empty($temp)) continue; // Don't add it if there's nothing to add
// Add the words found up until this point (from the last point) into the words array, as a string
$words[] = implode(" ", $temp);
// Reset the temp array so we can look for new words and continue
$temp = array();
continue;
}
// Add this word to the words array
$temp[] = $word;
}
$words[] = implode(" ", $temp);
// Print the words that have uppercase characters
printf("<pre>%s</pre>", print_r($words, true));
返回:
Array
(
[0] => Joe O'Donnell
[1] => Oscar De La Hoya
[2] => Pittsburgh Steelers
[3] => Sunday,
[4] => Joe
[5] => iPhone 5,
[6] => Oscar's iPad
)
答案 3 :(得分:2)
添加到联邦的甜蜜答案,这将是您的新PHP代码:
$string = "Joe O'Donnell and Oscar De La Hoya went to a Pittsburgh Steelers game on Sunday, where Joe lost his iPhone 5, so he borrowed Oscar's iPad";
preg_match_all("/\b((?:[A-Z]['a-z]*\s*\d*)+)\b|\b((?:[a-z]*[A-Z]['a-z]*\s*\d*)+)\b/", $string, $matches);
print_r($matches[0]);
$ matches [0]将是你的匹配数组。
答案 4 :(得分:0)
除了Fede,Kelly和Daniel之外,还有2种重音语言替代品
使用preg_split
$capitalized_words = preg_split("/ ([a-zàèìòùáéíóúýâêîôûãñõäëïöüÿçßøåæœ]| )+ /u", $string);
使用preg_match_all
//with 'u' flag
preg_match_all("/\b((?:[A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÆ]['a-zàèìòùáéíóúýâêîôûãñõäëïöüÿçßøåæœ]*\s*\d*)+)\b|\b((?:[a-zàèìòùáéíóúýâêîôûãñõäëïöüÿçßøåæœ]*[A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÆ]['a-zàèìòùáéíóúýâêîôûãñõäëïöüÿçßøåæœ]*\s*\d*)+)\b/u", $string, $capitalized_words);
使用preg_match_all
与trim
function get_capitalized_words($string){
$capitalized_words=array();
//with 'u' flag
preg_match_all("/\b((?:[A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÆ]['a-zàèìòùáéíóúýâêîôûãñõäëïöüÿçßøåæœ]*\s*\d*)+)\b|\b((?:[a-zàèìòùáéíóúýâêîôûãñõäëïöüÿçßøåæœ]*[A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÆ]['a-zàèìòùáéíóúýâêîôûãñõäëïöüÿçßøåæœ]*\s*\d*)+)\b/u", $string, $matches);
if(isset($matches[0])){
$capitalized_words=array_map('trim',$matches[0]);
}
return $capitalized_words;
}