我需要从http://www.hegnar.no/netfonds/aksjekurser/这个链接中抓取数据。其实我想从这个链接的表中删除数据。但是表的代码是在div标签中写的。我使用了php regex和file_get_content我无法帮助我解决它。
<?php
$html = file_get_contents("http://www.hegnar.no/netfonds/aksjekurser");
preg_match_all(
'<tr>
<td class="left"><a href=".*?">(.*?)<\/a><\/td>.*?
<td class="left">(.*?)<\/td>.*?
<td name=".*?">(.*?)<\/td>.*?
<td name=".*?">(.*?)<\/td>.*?
<td>(.*?)<\/td>.*?
<td class="up" name=".*?">(.*?)<\/td>.*?
<td class="up" name=".*?">(.*?)<\/td>.*?
<td>(.*?)<\/td>.*?
<td>(>*?)<\/td>.*?
<td>(.*?)<\/td>.*?
<td>(.*?)<\/td>.*?
<td name=".*?">(.*?)<\/td>
<td name=".*?">(.*?)<\/td><\/tr>/s',
$html,
$posts, // will contain the article data
PREG_SET_ORDER // formats data into an array of posts
);
foreach ($posts as $post) {
$selskap = $post[1];
$ticket = $post[2];
$siste = $post[3];
$kejop = $post[4];
$slag = $post[5];
$ending = $post[6];
$ending2 = $post[7];
$apring = $post[8];
$lav = $post[9];
$hoy = $post[10];
$forrige = $post[11];
$volume = $post[12];
$ratio = $post[13];
echo "$selskap</br>";
echo "$ticket</br>";
echo "$siste</br>";
echo "$kejop</br>";
echo "$slag</br>";
echo "$ending</br>";
echo "$ending2</br>";
echo "$apring</br>";
echo "$lav</br>";
echo "$hoy</br>";
echo "$forrige</br>";
echo "$volume</br>";
echo "$ratio</br>";
}
echo "<p>" . count($posts) . " posts found</p>";
答案 0 :(得分:1)
您可以使用此库 PHP Simple HTML DOM Parser
另请参阅此问题:Extract Information from HTML
答案 1 :(得分:0)
你的正则表达式中至少有1个拼写错误:
<td>(>*?)<\/td>.*?
可能意味着写成:
<td>(.*?)<\/td>.*?