Question

我使用Beautiful Soup从多个页面中抓取特定的表格，这些页面位于url.csv

中

代码：

def parse_csv(content, delimiter = ';'):
  csv_data = []
  for line in content.split('\n'):
    csv_data.append( [x.strip() for x in line.split( delimiter )] ) # strips spaces also
  return csv_data



list_url=parse_csv(open('url.csv','rU').read())
f = csv.writer(open("raw.csv", "w",encoding='utf8',newline=''))
# Write column headers as the first line


for i in range (0,len(list_url)):
    url=str(list_url[i][0]) ## read URL from an array coming from an Url-CSV
    page=urllib.request.urlopen(url)
    soup = BeautifulSoup(page.read(),"html.parser")
    restricted_webpage= soup.find( "div", {"id":"ingredients"} )
    readable_restricted=str(restricted_webpage)

    soup2=BeautifulSoup(readable_restricted,"html.parser")


    links = soup2.find_all('td')
    print(len(links))


    for link in links:
        i = link.find_next_sibling('td')
        if getattr(i, 'name', None):
            a, i = link.string, i.string
            f.writerow([a, i])

我的CSV看起来像：

"
                Cendres brutes (%)
        ","
                7.4
        " "
                Cellulose brute (%)
        ","
                1.6
        " "
                Fibres alimentaires (%)
        ","
                6.6
        " "
                Matière grasse (%)
        ","
                16.0

我希望它看起来像：

Cendres brutes(%);7.4
Cellulose brute (%);1.6
Fibres Alimentaires(%);6.6
Mati̬re grasse (%);16.0

我需要它看起来像这有两个原因： 1.当我在Excel中打开这样的CSV时看起来很棒。 2.我可以使用我的CSV解析器（在第一行parse_csv定义的解析器）并处理从我的CSV生成的数组，就像它是excel上的单元格一样。细胞[X] [Y]。这是非常少数。

我怎样才能实现这一目标？这就是说我想要那种CSV？

Answer 1

csv_writer = csv.writer(outfile, delimiter=';')

用分号变换昏迷。可由excel-EU阅读。

为什么我的CSV看起来不像我需要的样子？

1 个答案: