Question

我试图制作代理剪贴板，这是我的代码：

import bs4
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import lxml
from contextlib import redirect_stdout

meh=[]

pathf = '/home/user/tests.txt'

url = Request('https://www.path.to/table', headers={'User-Agent': 'Mozilla/5.0'})

page_html = urlopen(url).read()

page_soup = soup(page_html, features="xml")

final = page_soup.tbody

meh.append(final)

with open(pathf, 'w') as f:
    with redirect_stdout(f):
        print(meh[0].text.strip())

现在我希望文本以更易读的方式显示，因为它是这样的：

12.183.20.3615893USUnited StatesSocks5AnonymousYes11秒ago220.133.97.7445657TWTaiwanSocks5AnonymousYes11秒ago

如何将此文本转换为更易读的文件？类似的东西：

12.183.20.36 15893美国Socks5匿名是11秒前（新线）......

这是没有＆＃39; .text.strip（）＆＃39;的实际输出。如果jsbeautifier旅行有帮助，请格式化 https://ghostbin.com/paste/g56qe

Answer 1

您可以将所有td元素提取为列表，而不是提取完整的表格主体：

final_list = page_soup.findAll('td')

然后获取文本节点列表：

list_of_text_nodes = [td.text.strip() for td in final_list]

输出：

[u'182.235.38.81', u'40748', u'TW', u'Taiwan', u'Socks5', u'Anonymous'...]

或将所有文本节点作为单个字符串：

complete_text = " ".join([i.text.strip() for i in final_list])

输出：

'182.235.38.81 40748 TW Taiwan Socks5 Anonymous Yes 14 seconds ago ...'

替换txt文件python3中的文本标签

1 个答案: