Python csv export包含二进制标志(如何删除?)

时间:2017-07-24 08:47:59

标签: python html csv parsing scrape

我尝试将测试表导出到csv ...以下代码可以正常工作.. 但是,当我打开test1.csv文件时,某些行有“b标志(看起来像二进制标志) 即使我删除了编码('utf8'),仍然会得到b标志。 如何删除这些b标志并拥有一个干净的csv文件?

这是整个代码:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import csv
my_url = 'http://www.igobychad.com/test_table.html'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
page_soup.find("table", { "id" : "Emp_sum" })
table = page_soup.find("table", { "id" : "Emp_sum" })
for row in table.findAll("tr"):
   cells = row.findAll("td")
headers = [header.text for header in table.find_all('th')]
rows = []
for row in table.find_all('tr'):
    rows.append([val.text.encode('utf8') for val in row.find_all('td')])
with open('test1.csv', 'w') as f:
       writer = csv.writer(f)
       writer.writerow(headers)
       writer.writerows(row for row in rows if row)

结果如下:

Category,June2016,Apr.2017,May2017,June2017,Change from:May2017-June2017,Estatus,CN pop,Clf,Prate,Em,Ep ratio,Unem,Un rate
b''
"b'253,397'","b'254,588'","b'254,767'","b'254,957'",b'190'
"b'158,889'","b'160,213'","b'159,784'","b'160,145'",b'361'

1 个答案:

答案 0 :(得分:0)

我冒昧地改变了你的代码,希望你的表格式适合csv输出。由于csvwriter只能编写一行数据,因此我不得不在csv中与你的表格格式相匹配。它不完整,但您可以修复它以适合您的格式。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import csv
my_url = 'http://www.igobychad.com/test_table.html'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
page_soup.find("table", { "id" : "Emp_sum" })
table = page_soup.find("table", { "id" : "Emp_sum" })
for row in table.findAll("tr"):
cells = row.findAll("td")
headers = [header.text for header in table.find_all('th')]
rows = []
for row in table.find_all('tr'):
    rows.append([val.text for val in row.find_all('td') if val])

with open('test1.csv', 'w') as f:
    writer = csv.writer(f)
    # Your table headers are only the first 6 elements of "headers" so we write them
    writer.writerow(headers[:6])
    # Next will will have to compose the row to write to the csv
    index = 1
    for txt in headers[7:]:

        index += 1
        # Every element in headers after the 6'th are actually row start
        # So we add it to an empty list called "string" (bad name, you can change it)
        string = [txt]
        # We extend the list with the list of values corresponding to the list index taken from the rows list
        # To get float values we replace coma with dot from the rows string then cast to float
        string.extend([float(x.replace(',', '.')) for x in rows[index]])
        writer.writerow(string)