我有一个美丽的汤对象列表,我正在尝试进一步解析细胞的内容。我的输出变成了列表,每个列表有3个项目,因为表格有3列。
file = <html><p><center><h1> Interference Report </h1></center><p>
<b> Interference Report Project File: </b>C:\Users\ksobon\Documents\test_project_03_ksobon.rvt <br> <b> Created: </b> Monday, May 26, 2014 7:52:32 PM <br> <b> Last Update: </b> <br>
<p><table border=on> <tr> <td></td> <td ALIGN="center">A</td> <td ALIGN="center">B</td> </tr>
<tr> <td> 1 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021 </td> <td> Workset1 : Furniture : FUR_BoardroomTable10Chairs_gm : Board Room Layout : id 482259 </td> </tr>
<tr> <td> 2 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 483442 </td> </tr>
<tr> <td> 3 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469060 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475041 </td> </tr>
<tr> <td> 4 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469109 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475273 </td> </tr>
<tr> <td> 5 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469178 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475510 </td> </tr>
<tr> <td> 6 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469178 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 482306 </td> </tr>
<tr> <td> 7 </td> <td> whatever : Doors : DOR_Single_gm : 800w, 2100h (720Leaf) - Mark 102B : id 472052 </td> <td> Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1200 H x 1200 W - Mark 102B : id 472822 </td> </tr>
<tr> <td> 8 </td> <td> whatever : Doors : DOR_Single_gm : 800w, 2100h (720Leaf) - Mark 101A : id 472376 </td> <td> Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1200 H x 1200 W - Mark 101C : id 472720 </td> </tr>
<tr> <td> 9 </td> <td> Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1800 H x 1200 W 2 - Mark 101B : id 472688 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 482306 </td> </tr>
</table>
<p><b> End of Interference Report </b>
</html>
来自BeautifulSoup的导入BeautifulSoup 汤= BeautifulSoup(文件) tag = soup.findAll(&#39; tr&#39;)
for i in tag:
txt.append(i.findAll('td'))
现在我想将每个子列表元素转换为文本,所以我试过: txt1 = [x中的x为x in x in x] 然而,我对txt1的输出是平面列表而不是列表列表。我究竟做错了什么?
答案 0 :(得分:1)
将i.text
放入列表中:
txt1 = [[i.text] for x in txt for i in x]
您正在使用列表解析将列表展平,将所有元素提取到一个列表中。
l = [[1,2],[2,3],[5,6]]
flatten_l = [x for y in l for x in y]
print (flatten_l)
[1, 2, 2, 3, 5, 6]
也许你需要地图:
l=[[1,2,4],[2,3,5],[5,6,7]]
print [map(str, s) for s in l]
[['1', '2', '4'], ['2', '3', '5'], ['5', '6', '7']]
使用你的代码,它会在维护结构的每个元素上调用i.text。
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(file)
tag = soup.findAll('tr')
txt=[(i.findAll('td')) for i in tag]
final=[[] for x in range(len(txt))]
for j,k in enumerate(txt):
for i in k:
final[j].append(i.text)
print final
[[u'', u'A', u'B'], [u'1', u'Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021', u'Workset1 : Furniture : FUR_BoardroomTable10Chairs_gm : Board Room Layout......