如何从文本文档中获取2个数据点?

时间:2012-05-25 18:45:44

标签: php html

我将网页的来源复制到文本文档中,但我无法从文件中获取两个数据点;纬度和经度。

我必须制作并扫描文档的php文件是:

<?php

$ch = curl_init("http://www.marinetraffic.com/ais/shipdetails.aspx?MMSI=258245000");
$fp = fopen("example_homepage.txt", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

curl_exec($ch);
curl_close($ch);
fclose($fp);

header('Content-Type: text/plain');

$myFile = "example_homepage.txt";
$fh = fopen($myFile, 'r');
$theData = fread($fh, 9251);
fclose($fh);
echo $theData;

?> 

gps隐藏在看起来像这样的文本中(来自文件example_homepage.txt):

<img style="border: 1px solid #aaa" src="flags/NO.gif" />
<br/>
<b>Call Sign:</b>LAJW
<br/>
<b>IMO:</b>9386380,
<b>MMSI:</b>258245000
<br/>
<hr/>
<h2>Last Position Received</h2>
<b>Area:</b>North Sea
<br/>
<b>Latitude / Longitude:</b>
<a href='default.aspx?mmsi=258245000&centerx=5.311533&centery=60.39997&zoom=10&type_color=9'>60.39997˚ / 5.311533˚ (Map)</a>
<br/>
<b>Currently in Port:</b>
<a href='default.aspx?centerx=5.32245&centery=60.39085&zoom=14'>BERGEN</a>
<br/>
<b>Last Known Port:</b>
</b>
<a href='default.aspx?centerx=5.32245&centery=60.39085&zoom=14'>BERGEN</a>
<br/>
<b>Info Received:</b>0d 0h 20min ago
<br/>
<table>
    <tr>
        <td>&nbsp;
            <img src="shipicons/magenta0.png" />
        </td>
        <td>
            <a href='default.aspx?mmsi=258245000&centerx=5.311533&centery=60.39997&zoom=10&type_color=9'><b>Current Vessel's Track</b></a>
        </td>
    </tr>
    <tr>
        <td>
            <img src="windicons/w05_330.png" />
        </td>
        <td>
            <b>Wind:</b>5 knots, 327&deg;, 13&deg;C</td>
    </tr>
</table>
<a href='datasheet.aspx?datasource=ITINERARIES&MMSI=258245000'><b>Itineraries History</b></a>
<br/>
<hr/>
<h2>Voyage Related Info (Last Received)</h2>
<b>Draught:</b>6.8 m
<br/>
<b>Destination:</b>BERGEN HAVN
<br/>
<b>ETA:</b>2012-05-22 18:00
<br/>
<b>Info Received:</b>2012-05-23 18:43 (

我想要的两个数字是:

纬度:60.39085 经度:5.32245

我对这种事情并不那么有经验。也许有更好的方法。请告诉我。

编辑:仅使用最后三行代码,我可以在文本文件中获得前9251个字符。

2 个答案:

答案 0 :(得分:0)

可能有点矫枉过正,但您可以尝试PHP DOM + parse_url + parse_str

$text = file_get_contents('http://example.com/path/to/file.html');
$doc = new DOMDocument('1.0');
$doc->loadHTML($text);
foreach($doc->getElementsByTagName('div') AS $div) {
    $class = $div->getAttribute('class');
    if(strpos($class, 'news') !== FALSE) {
        if($div->hasAttribute('src') OR $div->hasAttribute('href')) {
            $parsed_url = parse_url($div->getAttribute('src')));
            $query_values = parse_str($parsed_url);
            $desired_values = array(
                $query_values['centerx'],
                $query__values['centery']
            );
        }
    }
}

答案 1 :(得分:0)

这是我为了得到我想要的结果所做的:(打印出* -70.19347 42.02112 *

<?php
//goes though and copies the web page to a text file
$ch = curl_init("http://photos.marinetraffic.com/ais/lightdetails.aspx?light_id=1000019773");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);

//prevents some parsing of the html document
header('Content-Type: text/plain');

//opens text file and reads contents to a string
$myFile = "example_homepage.txt";
$fh = fopen($myFile, 'r');
$theData = fread($fh,12000);
fclose($fh);

//finds the location of the beginning of the GPS data
$pos = strrpos($theData, "&centerx=");
if ($pos === false) { 
    // note: three equal signs
    echo "not found";
}

//cuts out that string and finds position for x and y components
$subtract = 12000-$pos-36;
$rest = substr($theData, $pos, -$subtract);
$lat = substr($rest, 9, -17);
$lonpos = strrpos($rest, "&centery=")+9;
$lon = substr($rest, $lonpos);

//turns the values into floats
$lat = floatval($lat);
$lon = floatval($lon);

//echo $rest;
echo $lat;
echo " ";
echo $lon;

?> 

希望这有助于某人