Question

我在一个变量中获取页面的源代码。

<!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>.  </body></html>

我想从上面的行中提取t1.304.log。我正在使用print log_name.split(".log",1)[0]，但它正在向我提取第一部分。

Answer 1

为什么不用HTML parser解析HTML？

>>> from bs4 import BeautifulSoup
>>> data = "<!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>.  </body></html>"
>>> BeautifulSoup(data).a["href"].split("=")[-1]
't1.304.log'

Answer 2

如果您只想快速完成，可以使用split()记录的here函数。

log_name.split("'")[1].split("=")[1]

但是，要以可重复使用的方式执行此操作，请查看beautifulsoup

等工具

已编辑添加

根据您的评论，您可以这样做：

print(log_name.split(".log",1)[0].rsplit("=",1)[1] + ".log")

Answer 3

   import re
    st = " <!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>.  </body></html>"

    mo = re.search('(t\S*log)', st)

    print(mo.group())

<强>输出

t1.304.log

Answer 4

您可以使用正则表达式（使用class AbstractException extends Exception { public function getName() { return array_search($this->getCode(), (new ReflectionClass($this))->getConstants()); } } class SyntaxException extends AbstractException { const BAD_SYNTAX = 90; const REQUIRED_PARAM = 91; const REQUIRED_VALUE = 92; const VALUE_TYPE = 93; const VALUE_OUT_OF_BOUNDS = 94; public function __construct ($message = "", $code = self::BAD_SYNTAX, Exception $previous = NULL) { $script = basename($GLOBALS['argv'][0]); echo "Invalid syntax: $message \nSee: `$script --help` for more information\n"; parent::__construct($message, $code, $previous); } } // in autoload include set_exception_handler(function(Exception $e) { error_log(basename($GLOBALS['argv'][0]) . ';'. date('Y-m-d H:i:s') .';'. $e->getName() .';'. $e->getMessage() .';'. $e->getFile() .';'. $e->getLine() ."\n", 3, 'error.log'); exit ($e->getCode()); });模块），假设您的字符串变量为re：

page_source

这将为您提供所有匹配的“* .log”子字符串列表。

但是，请注意，显然不建议使用正则表达式来解析HTML - 请参阅this discussion。

事实上，不要这样做，请使用alecxe's answer。

从变量获取python中的日志文件的值

4 个答案: