Question

我正在尝试清理一些我必须准备的数据，以便将其移动到数据库中。我不是贸易程序员，所以这对我来说都是新手。我一直在搜索试图寻找示例代码的网站。我找到了这个讨论，正如你所看到的，我试图调整我的情况： constructor

<html>
<head>
<title>This the title.</title>
</head>
<body>
<center>
<br />
<br />
<h2>Test Case 1</h2>
</center>
<table align="center" border="1" cellpadding="0" cellspacing="1" width="650">
<tr>
<td>
<font size="1"> Cell Title 1</font>
<br /> </td>
<td>
<font size="1"> Cell Title 2</font>
<br /> </td>
<td>
<font size="1"> Cell Title 3</font>
<br /> 
<font size="2">Value</font></td>
<td>
<font size="1"> Cell Title4</font>
<br /> 
<font size="2">Value</font></td>

这是错误：

回溯（最近一次呼叫最后）：文件＆＃34; ExtractTest2.py＆＃34;，第35行， print soup.find（＆＃34; td＆＃34;，{＆＃34; size＆＃34;：＆＃34; 2＆＃34;}）。find_parent（＆＃39; table＆＃39;） AttributeError：＆＃39; NoneType＆＃39;对象没有属性＆＃39; find_parent＆＃39;

我的最终目标是打印出来

Cell Title X : Value

如果有值，则会有一个Cell Title。

我哪里出错了？

Answer 1

soup.find("td", {"size":"2"})

主要问题在于此处 - HTML中没有td元素具有size属性。

相反，请检查size元素上的font。示例代码：

for title in soup.find_all("font", {"size": "1"}):
    value = title.find_next_sibling("font", {"size": "2"})

    print title.text, " | ", value.text if value else "No Value"

打印：

 Cell Title 1  |  No Value
 Cell Title 2  |  No Value
 Cell Title 3  |  Value
 Cell Title4  |  Value

如何使用BeautifulSoup从html文件中提取表数据？

1 个答案: