如何通过python regEx从文本文件中搜索href?

时间:2019-04-13 08:45:59

标签: python-3.6 python-regex

我通过执行一些CLI实用程序获得了一堆输出信息,并且在文件末尾有一个Web URL。我需要使用python regex查找该链接并显示为输出。下面是我为目的编写的三行代码。

file = str('/root/PycharmProjects/rest_project/sponge_link')

with open(file, 'r') as fo:
    fo.read().__str__()
    urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
    print(urls)

下面是文件的内容

INFO: Streaming results to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d
INFO: Analyzed target <path/path/path> (73 packages loaded, 10521 targets configured).
INFO: Found 1 target...
Target <path>/dence up-to-date:
 utility-<path>/dence_0.0-5_amd64.deb
 utility-<path>/dence_0.4-5_amd64.changes
INFO: Elapsed time: 23.669s, Critical Path: 0.47s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: Build Event Protocol files produced successfully.
INFO: Build completed successfully, 1 total action
INFO: Still uploading to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d

但是,当我执行程序时,出现以下错误:

Traceback (most recent call last):
  File "/root/PycharmProjects/rest_project/sel.py", line 24, in <module>
    urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
  File "/usr/lib/python3.6/re.py", line 222, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object

它抱怨数据类型应该是字符串。因此,我在文件路径上使用了str(),但即使这样也不起作用。

有人可以帮我理解我的错误吗?

1 个答案:

答案 0 :(得分:1)

您正在将file object传递给re.findall,而不是string。您需要将文件读取的结果分配给变量,然后将其传递到re.findall

  1. fo.read().__str__()应该类似于lines = fo.read()
  2. urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)应该是urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', lines)