我是一个初学程序员,试图创建一个简单的应用程序,只是抓取一个网站并返回值。
我正在尝试做一些我认为很简单的事情,但在搜索和尝试之后,已经放弃了,只是问。
使用我的刮刀,我返回三个变量: $ title1 , $ title2 和 $ title3 。所有$ title都来自我试图找到文章名称的不同方法。理想情况下,我只需要查找一个并完成,但有些网站以不同方式存储数据(有些通过元标记,隐藏的div,元素等)。
我需要一种方法来执行以下伪代码:
if $title1, $title2, $title3 != null { // don't count a string if it is null
$title1_stringlength = string_length($title1) //find string length of the $titles
$title2_stringlength = string_length($title2)
$title3_stringlength = string_length($title3)
$realtitle = $lowestvalueofstringlength; // $realtitle gets whichever $title is shortest in length, not counting any null $title's
}
以下是我需要执行此操作的示例:
echo $title1; //echoes "Exercise Daily"
echo $title2; //echoes "null"
echo $title3; //echoes "Exercise Daily - And More advice on SaveTheTwinkie.org"
$realtitle = $title1;//should be $title1 because it was shortest that wasn't null
//or a different example from another site
echo $title1; //echoes "Wow look at this Article Title!"
echo $title2; //echoes "null"
echo $title3; //echoes "Wow look at this Article Title! - from StupidArticles.tv"
$realtitle = $title1;//should be $title1 because it was shortest that wasn't null
所以我的代码会查找字符串长度最短的$ title(不是null)并将值赋给$ realtitle。
感谢您的帮助!如果您需要更多详细信息,请询问!
修改
继承我的完整代码:直到其中一个$ title为“”,然后$ realtitle变为“”
<?php
$sites_html = file_get_contents($url);
$html = new DOMDocument();
@$html->loadHTML($sites_html);
$title1 = null; //reset
$title2 = null; //reset
$title3 = null; //reset
//Get all meta tags and loop through them.
foreach($html->getElementsByTagName('meta') as $meta) {
if($meta->getAttribute('property')=='og:title'){
//Assign the value from content attribute to $title1
$title1 = $meta->getAttribute('content');
}
}
foreach($html->getElementsByTagName('h1') as $div) {
if($div->getAttribute('itemprop')=='name'){
$title2 = $div->nodeValue;
}
}
foreach($html->getElementsByTagName('h1') as $div) {
if($div->getAttribute('class')=='fn'){
$title3 = $div->nodeValue;
}
}
$realtitle = array_reduce(array($title2, $title1, $title3), function($a, $b) {
return strlen($a) && $a != 'null' && strlen($a) < strlen($b) ? $a : $b;
}, null);
echo 'metaogtitle: '.$title1 . '<br/><br/><br/><br/><br/>';
echo 'name: '.$title2. '<br/><br/><br/><br/><br/>';
echo 'name2: '.$title3. '<br/><br/><br/><br/><br/>';
echo 'realtitle: '.$realtitle. '<br/><br/><br/><br/><br/>';
?>
答案 0 :(得分:2)
// Filter invalid values
$titles = array_filter($titles, function($title) { return $title && $title != 'null'; });
// Just sort :)
usort ($titles, function ($left, $right) { return strlen($left) - strlen($right); });
echo $titles[0];
答案 1 :(得分:1)
这是一个使用无限数量字符串的变体:
$shortest = NULL;
$shortestReduce = function ($string) use (&$shortest) {
if ( ($string === "null") || !($len = strlen($string))) {
return $shortest;
}
if (!isset($shortest) || $len < strlen($shortest)) {
$shortest = $string;
}
return $shortest;
};
$shortestReduce($string1);
$shortestReduce($string2);
$shortestReduce($string3);
# ...
echo $shortest; # "Exercise Daily"
此reduce允许您对生成一个结果的多个值应用相同的函数,这里是不是"null"
的最短字符串。
答案 2 :(得分:0)
这个应该有效:
function real_title($titles) {
/* Gets array of titles.
Return the shortest that isn't null.
*/
$min_length = strlen($titles[0]);
$real_title = $titles[0];
foreach ($titles as $single_title) {
$title_length = strlen($single_title);
if (($title_length < $min_length) && ($title_length > 0)) {
$min_length = $title_length;
$real_title = $single_title;
}
}
return $real_title;
}