我使用html页面中的beautifulsoup html标签进行解析,并从此标签中获取文本内容。
我的输出中有空/空行,我找不到这个原因。 然后我尝试使用PrettyTable格式化输出。看起来很好,但没有给我带来什么。因为我想稍后在MS Access中导入该文件。它可以是.csv作为输出。
我希望你能帮助我。
我的代码的一部分:
text_file = open("Output3.txt", "w")
table = PrettyTable(['Names', 'Addresses'])
with urllib.request.urlopen("file:///C:/Users/x/Desktop/test.html") as url:
soup = BeautifulSoup(url, "html.parser")
names = [name.get_text() for name in soup.findAll("div", {"class": "name m08_name"})]
addresses = [address.get_text() for address in soup.findAll("div", {"class": "adresse m08_adresse"})]
for line in zip(names, addresses):
table.add_row(line)
text_file.write(str(table))
text_file.close()
print("Prozess finish")
我目前的输出:
输出来自开源电话簿。地址是这个城市的医生。它没有私有数据和信息。
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------+
| Namen | Adressen |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------+
| | |
| | |
| Augenarzt Dr.med. Wolf Eckard Weingärtner | |
| | Königstr. 70 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Baumann Achim Dr. med., Facharzt für Hautkrankheiten | |
| | Königstr. 66 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Ärztehaus am Diakonie-Klinikum Praxis für Mund-, Kiefer- und Plastische | |
| | Falkertstr. 46 |
| | |
| | 70176 Stuttgart-West |
| | |
| | |
| | |
| | |
| | |
| Ihr Hautarzt Dr. Malte-Christian Thode, Dr. Sabrina Germann-Samara und Kollegen | |
| | Wilhelmstr. 40 |
| | |
| | 71638 Ludwigsburg-Mitte |
| | |
| | |
| | |
| | |
| | |
| Ärztehaus-West Dr.med.Angela Faller, Dr.med. Claudia Lerschmacher Faller, Lerschmacher, Vogt | |
| | Kornbergstr. 29 |
| | |
| | 70176 Stuttgart-West |
| | |
| | |
| | |
| | |
| | |
| Richter Constanze Dr.med., Fachärztin für Innere Medizin | |
| | Seelbergstr. 11 |
| | |
| | 70372 Stuttgart-Bad Cannstatt |
| | |
| | |
| | |
| | |
| | |
| Dr.med. Ulrich Schreiber, Ihre Frauenarztpraxis in Stuttgart | |
| | Hirschstr. 31 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Günther Eck Dr.med., FA für HNO | |
| | Marienstr. 5 |
| | |
| | 70178 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Ambulante Gastroenterologie Dres.med. Karl M. Teubner, Albrecht G. Maier, Diet | |
| | Industriestr. 4 |
| | |
| | 70565 Stuttgart-Vaihingen |
| | |
| | |
| | |
| | |
| | |
| Bergener Malte Dr.med., Reuter Matthias Dr.med., Ungemach Gerd Dr.med. | |
| | Wilhelmsplatz 11 |
| | |
| | 70182 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Dr. med. Holger Lange Facharzt für Urologie - Belegarzt | |
| | Hirschstr. 31 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Abel Theophil Dr.med., Arzt für Orthopädie | |
| | Schwabstr. 91 |
| | |
| | 70193 Stuttgart-West |
| | |
| | |
| | |
| | |
| | |
| Ambulante Pneumologie mit Allergiezentrum (BAG) - Dr.med.Frank Heimann Dr.med.Rainer Ehmann, Dr.med.K. Seyfahrt-Jürgens | |
| | Rotebühlplatz 19 |
| | |
| | 70178 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Aerzte am HautTherapieZentrum Stuttgart | |
| | Calwer Str. 11 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| auge im fokus Dr.med. Pervanidis, Dr.med. Wagner, Dr.med. Stergiou | |
| | Rotebühlplatz 17 |
| | |
| | 70178 Stuttgart-Mitte |
| | |
| | |
| | |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------+
我的预期输出:
names; adresses;
name1; adress1
name2; adress2
.. ; ..
从test.html剪短了
<div class="adresse m08_adresse" data-role="adresse" itemprop="address" itemscope itemtype="https://schema.org/PostalAddress">
<address>
<a
href="https://adresse.gelbeseiten.de/1124105191531/baumann-achim-dr-med-facharzt-fuer-hautkrankheiten/stuttgart/mitte#originIndex=1;origin=/arzt/stuttgart"
data-wipe='{"listener":"click","name":"Trefferliste Adresse zur Detailseite","id":"1124105191531", "synchron": true}'
>
<span itemprop="streetAddress">Königstr. 66</span>
<br />
<span itemprop="postalCode">70173</span> <span itemprop="addressLocality">Stuttgart-Mitte</span>
</a>
</address>
</div>