使用BeautifulSoup将一行中的子标签结果写入csv文件

时间:2016-09-02 10:54:11

标签: csv tags beautifulsoup

from bs4 import BeautifulSoup
import urllib2
import re
import json
p = """
<thead>
<tr>
<th>Company Name</th>
<th>Symbol</th>
<th>Market</th>
<th>Price</th>
<th>Shares</th>
<th>Offer Amount</th>
<th>Date Priced</th>
</tr>
</thead>
<tr>
<td><a href="http://www.nasdaq.com" id="two">EXFO INC.</a></td>
<td><a href="http://www.nasdaq.com" id="two">EXFO</a></td>
<td><a href="http://www.nasdaq.com" id="two">NASDAQ</a></td>
<td>$26</td>
<td>7,000,000</td>
<td>$182,000,000</td>
<td>6/30/2000</td>
</tr>
<tr>
<td><a href="http://www.nasdaq.com">IGO, INC.</a></td>
<td><a href="http://www.nasdaq.com" id="two">MOBE</a></td>
<td><a href="http://www.nasdaq.com" id="two">NASDAQ</a></td>
<td>$12</td>
<td>4,000,000</td>
<td>$48,000,000</td>
<td>6/30/2000</td>
</tr>"""
soup = BeautifulSoup(p, 'html.parser')
for ana in soup.find_all('td'):
    if ana.parent.name == 'tr':
    print ana.string

嗨!我正在尝试从一个站点写入csv文件中的一些数据。理想的结果是带有

的csv文件
EXFO INC.,EXFO,NASDAQ,$26,7,000,000,$182,000,000,6/30/2000
IGO, INC.,MOBE,NASDAQ, $12, 4,000,000,$48,000,000,6/30/2000

我现在学会做的是打印以下内容

EXFO INC.
EXFO
NASDAQ
$26
7,000,000
$182,000,000
6/30/2000
IGO, INC.
MOBE
NASDAQ
$12
4,000,000
$48,000,000
6/30/2000

任何想法如何做到这一点?我只是不知道如何把它全部放入循环和每个标签&#34;&#34;提取所有&#34;&#34;标签

1 个答案:

答案 0 :(得分:0)

选择表格,在 thead 中找到 th 标签然后写入,然后提取所有其他行并写下 td 文本:< / p>

from bs4 import BeautifulSoup
from csv import writer

soup = BeautifulSoup(html)
table = soup.select_one("table")
with open("out.csv", "w") as f:
    wr = writer(f)
    wr.writerow([th.text for th in table.select("thead  th")])
    for row in table.find_all("tr"):
        data = [td.text for td in row.find_all("td")]
        if data:
            wr.writerow(data)

哪会给你:

Company Name,Symbol,Market,Price,Shares,Offer Amount,Date Priced
EXFO INC.,EXFO,NASDAQ,$26,"7,000,000","$182,000,000",6/30/2000
"IGO, INC.",MOBE,NASDAQ,$12,"4,000,000","$48,000,000",6/30/2000

另一种方法是找到所有 tr&#39> 和索引/切片:

from bs4 import BeautifulSoup
from csv import writer
soup = BeautifulSoup(html)

rows = soup.select("table tr")
with open("out.csv", "w") as f:
    wr = writer(f)
    wr.writerow([th.text for th in rows[0].find_all("th")])
    for row in rows[1:]:
        data = [td.text for td in row.find_all("td")]
        wr.writerow(data)

无论采用何种方法,您都需要浏览所有 tr 标记,以便在每个tr中提取所有关联的 td 标记,以将数据分组为行。