我有一个txt文件,里面装满了html代码。我正在尝试创建一个PHP页面来搜索代码并获取"用户名"对我来说:
以下是该页面的一小部分示例:
<div class="search-result-details">
<div class="employee-name">This is my name!</div>
<ul class="employee-details">
<li><span class="label">Login</span>username</li>
<li><span class="label">Employee ID</span>####</li>
<li><span class="label">Barcode ID</span>###</li>
<li><span class="label">Status</span>Active</li>
</ul>
<ul class="org-details">
<li><span class="label">Location</span>SAT1 (755)</li>
<li><span class="label">Shift</span>AAAA</li>
<li><span class="label">Department</span>1231</li>
<li><span class="label">Area</span>26</li>
<li><span class="label">Crew</span>0</li>
<li><span class="label">Supervisor</span>manager name</li>
</ul>
</div>
</a></li>
</ol>
</div>
我需要从以下行获取用户名:
<li><span class="label">Login</span>username</li>
我已经知道了,至少要抓住我需要的路线:
<?php
$file = 'log.txt';
$searchfor = '<ul class="employee-details">
<li><span class="label">Login</span>';
// the following line prevents the browser from parsing this as HTML.
header('Content-Type: text/plain');
// get the file contents, assuming the file to be readable (and exist)
$contents = file_get_contents($file);
// escape special characters in the query
$pattern = preg_quote($searchfor, '/');
// finalise the regular expression, matching the whole line
$pattern = "/^.*$pattern.*\$/m";
// search, and store all matching occurences in $matches
if(preg_match_all($pattern, $contents, $matches)){
echo "Found matches:\n";
echo implode("\n", $matches[0]);
}
else{
echo "No matches found";
}
?>
当前输出:
<ul class="employee-details">
<li><span class="label">Login</span>username</li>
非常感谢任何帮助。谢谢。
答案 0 :(得分:0)
虽然有点hacky,但这是你可以做到的一种方式。
$contents = file_get_contents($file);
preg_match("/(Login<\/span>)([a-zA-Z0-9]*)(<\/li>)/", $contents, $matches);
if (is_array($matches) && isset($matches[2])) {
$username = trim($matches[2]);
}
当然,中间捕获组需要支持用户名中可能包含的任何字符。
另请注意,如果HTML结构发生变化,这将中断。
最后,如果文件中可以有多个用户名,则可以使用preg_match_all
,然后$matches[2]
将是一个用户名数组。
答案 1 :(得分:0)
使用DOMDocument:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML('<div class="search-result-details">
<div class="employee-name">This is my name!</div>
<ul class="employee-details">
<li><span class="label">Login</span>username</li>
<li><span class="label">Employee ID</span>####</li>
<li><span class="label">Barcode ID</span>###</li>
<li><span class="label">Status</span>Active</li>
</ul>
<ul class="org-details">
<li><span class="label">Location</span>SAT1 (755)</li>
<li><span class="label">Shift</span>AAAA</li>
<li><span class="label">Department</span>1231</li>
<li><span class="label">Area</span>26</li>
<li><span class="label">Crew</span>0</li>
<li><span class="label">Supervisor</span>manager name</li>
</ul>
</div>
</a></li>
</ol>
</div>');
libxml_use_internal_errors(false);
$html = new DOMXPath($doc);
$result = '';
foreach ($html->query("//*[@class='label']") as $value) {
if ($value->textContent == 'Login') {
$result = $value->nextSibling->textContent;
break;
}
}
echo $result;
<强>输出:强>
username
libxml_use_internal_errors
的原因是抑制this answer中列出的验证错误。