我有多个表,如下面的MySQL datadump表,每个表代表数据库中的一行。我想提取以下信息,以便将其迁移到不同的数据库。
<table name="dashboard">
<column name="id">1</column>
<column name="timestamp">2009-10-09 15:10:30</column>
<column name="config_offline">1</column>
<column name="item1">0.00</column>
<column name="item2">0.00</column>
</table>
<table name="orders">
<column name="id">1</column>
<column name="timestamp">2016-08-04 08:39:13</column>
<column name="item">1</column>
<column name="payment">Check</column>
<column name="cost">175.00</column>
<column name="paid">175.00</column>
<column name="cancel">0</column>
<column name="received">1</column>
</table>
以下是我目前正在尝试的内容:
from bs4 import BeautifulSoup
with open("test.xml", "r") as markup:
soup = BeautifulSoup(markup, "xml")
for row in soup.find_all('column'):
print(row.text)
with open("test.xml", "r") as markup:
soup = BeautifulSoup(markup, "xml")
# And I also try this, but this doesn't work neither.
for row in soup.find_all('table'):
for c in row.find_all('column'):
print(c.text)
这种方法的问题现在我无法区分这两个表名。有没有办法可以分别从两个不同的表中提取信息?
答案 0 :(得分:1)
您可以按特定属性找到特定的表格:
import bs4
div_test="""
<table name="dashboard">
<column name="id">1</column>
<column name="timestamp">2009-10-09 15:10:30</column>
<column name="config_offline">1</column>
<column name="item1">0.00</column>
<column name="item2">0.00</column>
</table>
<table name="orders">
<column name="id">1</column>
<column name="timestamp">2016-08-04 08:39:13</column>
<column name="item">1</column>
<column name="payment">Check</column>
<column name="cost">175.00</column>
<column name="paid">175.00</column>
<column name="cancel">0</column>
<column name="received">1</column>
</table>
"""
soup = bs4.BeautifulSoup(div_test)
table_dashboard = soup.find('table', {'name':"dashboard"})
table_orders = soup.find('table', {'name':"orders"})
print table_dashboard
print '\n'
print table_orders
输出会为您提供table_dashboard
和table_orders
:
<table name="dashboard">
<column name="id">1</column>
<column name="timestamp">2009-10-09 15:10:30</column>
<column name="config_offline">1</column>
<column name="item1">0.00</column>
<column name="item2">0.00</column>
</table>
<table name="orders">
<column name="id">1</column>
<column name="timestamp">2016-08-04 08:39:13</column>
<column name="item">1</column>
<column name="payment">Check</column>
<column name="cost">175.00</column>
<column name="paid">175.00</column>
<column name="cancel">0</column>
<column name="received">1</column>
</table>
答案 1 :(得分:0)
似乎显而易见......迭代&#34;表&#34;首先标记每个&#34;表&#34;标签在&#34;列&#34;标签