我正在开发一个php脚本来从wowhead中提取任务数据,特别是开始和结束任务的内容,无论是项目还是npc,以及它的id或名称分别是什么。这是整个脚本的相关部分,其余部分涉及数据库插入。如果有人感兴趣的话,这是我提出的完整代码片段。此外,看到这将运行大约15,000次,这是获取/存储数据的最佳方法吗?
<?php
$quests = array();
//$questlimit = 14987;
$questlimit = 5;
$currentquest = 1;
$questsprocessed = 0;
while($questsprocessed != $questlimit)
{
echo "<br>";
echo " Start of iteration: ".$questsprocessed." ";
echo "<br>";
echo " Attempting to process quest: ".$currentquest." ";
echo "<br>";
$quests[$currentquest] = array();
$baseurl = 'http://wowhead.com/quest=';
$fullurl = $baseurl.$currentquest;
$data = drupal_http_request($fullurl);
$queststartloc1 = strpos($data->data, 'quest_start');
$queststartloc2 = strpos($data->data, 'quest_end');
if($queststartloc1==false)
{$currentquest++; echo "No data for this quest"; echo "<br>"; continue;}
$questendloc1 = strpos($data->data, 'quest_end');
$questendloc2 = strpos($data->data, 'x5DDifficulty');
$startcaptureLength = $queststartloc2 - $queststartloc1;
$endcaptureLength = $questendloc2 - $questendloc1;
$quest_start_raw = substr($data->data,$queststartloc1, $startcaptureLength);
$quest_end_raw = substr($data->data, $questendloc1, $endcaptureLength);
$startDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_start_raw);
$endDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_end_raw);
$quests[$currentquest]['Start'] = array();
$quests[$currentquest]['End'] = array();
if(strstr($startDecoded, 'npc'))
{
$quests[$currentquest]['Start']['Type'] = "npc";
preg_match('~npc=(\d+)~', $startDecoded, $startmatch);
}
else
{
$quests[$currentquest]['Start']['Type'] = "item";
preg_match('~item=(\d+)~', $startDecoded, $startmatch);
}
$quests[$currentquest]['Start']['ID'] = $startmatch[1];
if(strstr($endDecoded, 'npc'))
{
$quests[$currentquest]['End']['Type'] = "npc";
preg_match('~npc=(\d+)~', $endDecoded, $endmatch);
}
else
{
$quests[$currentquest]['End']['Type'] = "item";
preg_match('~item=(\d+)~', $endDecoded, $endmatch);
}
$quests[$currentquest]['End']['ID'] = $endmatch[1];
//var_dump($quests[$currentquest]);
echo " End of iteration: ".$questsprocessed." ";
echo "<br>";
echo " Processed quest: ".$currentquest." ";
echo "<br>";
$currentquest++;
$questsprocessed++;
}
?>
答案 0 :(得分:3)
这些被称为&#34;转义序列&#34;。通常,它们用于表示不可打印的字符,但可以编码任何字符。在php中,你可以像这样解码它们:
$text = '
quest_start\\x5DStart\\x3A\\x20\\x5Bitem\\x3D16305\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5D\\x5Bicon\\x20name\\x3Dquest_end\\x5DEnd\\x3A\\x20\\x5Burl\\x3D\\x2Fnpc\\x3D12696\\x5DSenani\\x20Thunderheart\\x5B\\x2Furl\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5DNot\\x20sharable\\x5B\\x2Fli\\x5D\\x5Bli
';
$decoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $text);
它为您提供了与此类似的字符串:
quest_start]Start: [item=16305][/icon][/li][li][icon name=quest_end]End: [url=/npc=12696]Senani Thunderheart[/url][/icon][/li][li]Not sharable[/li][li
(显然,某种BB代码)。要删除所有bbcodes,必须进行一次替换:
$clean = preg_replace('~(\[.+?\])+~', ' ', $decoded);