我试图制作代理剪贴板,这是我的代码:
import bs4
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import lxml
from contextlib import redirect_stdout
meh=[]
pathf = '/home/user/tests.txt'
url = Request('https://www.path.to/table', headers={'User-Agent': 'Mozilla/5.0'})
page_html = urlopen(url).read()
page_soup = soup(page_html, features="xml")
final = page_soup.tbody
meh.append(final)
with open(pathf, 'w') as f:
with redirect_stdout(f):
print(meh[0].text.strip())
现在我希望文本以更易读的方式显示,因为它是这样的:
12.183.20.3615893USUnited StatesSocks5AnonymousYes11秒ago220.133.97.7445657TWTaiwanSocks5AnonymousYes11秒ago
如何将此文本转换为更易读的文件?类似的东西:
12.183.20.36 15893美国Socks5匿名是11秒前(新线)......
这是没有' .text.strip()'的实际输出。如果jsbeautifier旅行有帮助,请格式化 https://ghostbin.com/paste/g56qe
答案 0 :(得分:0)
您可以将所有td
元素提取为列表,而不是提取完整的表格主体:
final_list = page_soup.findAll('td')
然后获取文本节点列表:
list_of_text_nodes = [td.text.strip() for td in final_list]
输出:
[u'182.235.38.81', u'40748', u'TW', u'Taiwan', u'Socks5', u'Anonymous'...]
或将所有文本节点作为单个字符串:
complete_text = " ".join([i.text.strip() for i in final_list])
输出:
'182.235.38.81 40748 TW Taiwan Socks5 Anonymous Yes 14 seconds ago ...'