我想从网站上读取特定的细节到json对象。这是网站的html:
<body>
<a href="https://myanimelist.net/anime/1/Cowboy_Bebop" class="hovertitle">Cowboy Bebop (1998)</a>
<div style="margin-top: 8px; margin-bottom: 10px;">In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now uninhabitable surface of planet Earth behind. The Inter Solar System Police attempts to ke... <a href="https://myanimelist.net/anime/1/Cowboy_Bebop">read more</a></div>
<span class="dark_text">Genres:</span> Action, Adventure, Comedy, Drama, Sci-Fi, Space<br>
<span class="dark_text">Status:</span> Finished Airing<br>
<span class="dark_text">Type:</span> TV<br>
<span class="dark_text">Episodes:</span> 26<br>
<span class="dark_text">Score:</span> 8.81 <small>(scored by 345,892 users)</small><br>
<span class="dark_text">Ranked:</span> #26<br>
<span class="dark_text">Popularity:</span> #35<br>
<span class="dark_text">Members:</span> 664,034<br>
</body>
我只查找class="dark_text"
的跨度内容以及它们的前5个条目。完成的json对象应如下所示:
{
"genres": [ "Action", "Adventure", "Comedy", "Drama", "Sci-Fi", "Space" ], // Array
"status": "Finished Airing",
"type": "TV",
"episodes": "26",
"score": "8.81"
}
我这样做的方法是:
function ParseDataIntoJson($html)
{
$dom = new DOMDocument;
@$dom->loadHTML($html); // supress errors
$xpath = new DOMXPath($dom);
$items = $xpath->query("//span[@class='dark_text']"); // spans itself
$values = $xpath->query("//span[@class='dark_text']/following-sibling::text()"); // text after the span
$item_array = array();
$value_array = array();
for ($i = 0; $i < 5; $i++) // only first 5 entries
{
$item = strtolower(rtrim($items[$i]->textContent, ":")); // remove : at the end and convert it to lowercase string
$item_array[$i] = $item;
$value = rtrim(ltrim($values[$i]->textContent, " "), " "); // remove leading/ending space
if($i == 0 && strpos($value, ', ')) // if i = 0 (genres entry) and it contains ", "
$value = explode(", ", $value); // split into array using ", " as delimiter
$value_array[$i] = $value; // if $value is an array after splitting, will this still work?
}
// generate json from data and return it
// return $json;
}
正如你所看到的那样它是非常硬编码的(那是因为我刚刚进入php)并且转换为json部分仍然缺失。所以,如果你们中的任何人能够帮助我,我将不胜感激。提前谢谢!
答案 0 :(得分:0)
我已经在几个地方更新了代码,第一个是我只使用一个XPath而不是两个,在我使用的主循环中
$items[$i]->nextSibling->textContent
获取相对于当前项目的数据(<span...
)。
同样在for()
我确保计数器不超过找到的元素数量。
主要的是,对于每个条目,我创建两个字段 - $entryName
是标签,$value
是内容 - 这些字段的处理方式与您当前相同。然后这些用于创建关联数组。然后将其传递给json_encode()
以获得结果......
function ParseDataIntoJson($html)
{
$dom = new DOMDocument;
$dom->loadHTML($html); // dont supress errors
$xpath = new DOMXPath($dom);
$items = $xpath->query("//span[@class='dark_text']"); // span itself
$value_array = array();
for ($i = 0; $i < 5 && $i < $items->length; $i++) // only first 5 entries
{
$entryName = strtolower(rtrim($items[$i]->textContent, ":")); // remove : at the end and convert it to lowercase string
$content = $items[$i]->nextSibling->textContent; // Fetch text of next node
$value = trim($content); // remove leading/ending space
if($i == 0 && strpos($value, ', ')) {// if i = 0 (genres entry) and it contains ", "
$value = explode(", ", $value); // split into array using ", " as delimiter
}
$value_array[$entryName] = $value; // Create an array of the data
}
return json_encode($value_array);
}
echo ParseDataIntoJson($html);
这输出......
{"genres":["Action","Adventure","Comedy","Drama","Sci-Fi","Space"],
"status":"Finished Airing",
"type":"TV",
"episodes":"26",
"score":"8.81"}