在通过python美丽汤后用空格逗号替换html中的br标签

时间:2017-03-09 16:49:24

标签: python html beautifulsoup

br /> (例如,没有。)每行之间的标记 29 BOSWALL PARKWAY
EDINBURGH
EH5 2BR。当我在这个文本编辑器中写作时,它们的标签会消失。

当我通过美丽的汤输出看起来像29 BOSWALL PARKWAYEDINBURGHEH5 2BR

有没有人知道生成逗号空间的好方法是标签是什么?非常感谢提前。

我已经根据其他示例尝试了以下内容

第一种方法在扫描字符串文字错误时生成SyntaxError:EOL 第二种方法AttributeError:' unicode'对象没有属性' replaceAll'

import requests
from bs4 import BeautifulSoup as soup

url = 'https://www.saa.gov.uk/search/?SEARCHED=1&ST=&SEARCH_TERM=city+of+edinburgh%2C+BOSWALL+PARKWAY%2C+EDINBURGH&ASSESSOR_ID=&SEARCH_TABLE=valuation_roll_cpsplit&DISPLAY_COUNT=10&TYPE_FLAG=CP&ORDER_BY=PROPERTY_ADDRESS&H_ORDER_BY=SET+DESC&DRILL_SEARCH_TERM=BOSWALL+PARKWAY%2C+EDINBURGH&DD_TOWN=EDINBURGH&DD_STREET=BOSWALL+PARKWAY&UARN=110B60329&PPRN=000000000001745&ASSESSOR_IDX=10&DISPLAY_MODE=FULL#results'

baseurl = 'https://www.saa.gov.uk'

session = requests.session()

response = session.get(url)

# content of search page in soup 
html = soup(response.content,"lxml")

Address = LeftBlockData[3].get_text().strip()
#Address_1 = Address.replace("<br>,",");
Address_1 = Address.replaceAll("[\\t\\n\\r]"," ");

print (Address1)

1 个答案:

答案 0 :(得分:1)

<br>是换行符,我们通常不会在数据中包含这样的控制字符,但您可以使用列表来包含所有行:

In [72]: table = html.find('div', class_='table')


In [77]: for row in table.find_all('div', class_='row table-row'):
    ...:     print(row.get_text(strip=True, separator='|').split('|'))
    ...:     
    ...:     
    ...:     
    ...:     
['Ref No.', '110B60329']
['Office', 'LOTHIAN VJB']
['Description', 'STORE']
['Property Address', '29 BOSWALL PARKWAY', 'EDINBURGH', 'EH5 2BR']
['Proprietor', 'SCOTTISH MIDLAND CO-OP SOCIETY LTD.']
['Tenant', 'PROPRIETOR']
['Occupier']