使用美丽的汤选择文本数据

时间:2016-05-12 18:55:46

标签: python beautifulsoup

好的我正在尝试使用python beautiful soup从下面的html中选择文本数据,但我遇到了麻烦。基本上<b>中有一个标题,但我想要除此之外的数据。例如,第一个是评估类型,但我只想要容量曲线。以下是我到目前为止的情况:

modelinginfo = soup.find( "div", {"id":"genInfo"} ) # this is my raw data
rows=modelinginfo.findChildren(['p']) # this is the data displayed below
for row in rows:
    print(row)
    print('/n')
    cells = row.findChildren('p')
    for cell in cells:
         value = cell.string
         print("The value in this cell is %s" % value)


[<p><b>Assessment Type: </b>Capacity curve</p>,
 <p><b>Name: </b>Borzi et al (2008) - Capacity-Xdir 4Storeys InfilledFrame NonSismicallyDesigned</p>,
 <p><b>Category: </b>Structure specific - Building</p>,
 <p><b>Taxonomy: </b>CR/LFINF+DNO/HEX:4 (GEM)</p>,
 <p><b>Reference: </b>The influence of infill panels on vulnerability curves for RC buildings (Borzi B., Crowley H., Pinho R., 2008) - Proceedings of the 14th World Conference on Earthquake Engineering, Beijing, China</p>,
 <p><b>Web Link: </b><a href="http://www.iitk.ac.in/nicee/wcee/article/14_09-01-0111.PDF" style="color:blue" target="_blank"> http://www.iitk.ac.in/nicee/wcee/article/14_09-01-0111.PDF</a></p>,
 <p><b>Methodology: </b>Analytical</p>,
 <p><b>General Comments: </b>Sample Data: A 4-storey building designed according to the 1992 Italian design code (DM, 1992), considering gravity loads only, and the Decreto Ministeriale 1996 (DM, 1996) when considering seismic action (the seismically designed building has been designed assuming a lateral force equal to 10% of the seismic weight, c=10%, and with a triangular distribution shape).

 The Y axis in the capacity curve represent the collapse multiplier: Base shear resistance over seismic weight.</p>,
 <p><b>Geographical Applicability: </b> Italy</p>]

1 个答案:

答案 0 :(得分:1)

1。)您可以迭代p children并打印所有内容,但b代码除外:

for cell in cells:
    for element in cell.children:
        if element.name != 'b':
            print("The value in this cell is %s" % element)

2。)您可以使用extract()方法为b标记清除不需要的内容:

for cell in cells:
    if cell.b:
        # remove "b" tag
        cell.b.extract()
    print("The value in this cell is %s" % cell)