Question

我是python尝试构建我的第一个脚本的新手。我想废弃一个url列表并将其导出到csv文件中。

我的脚本执行得很好但是在打开csv文件时只写入几行数据。当我打印我正在尝试编写的列表（sharelist和sharelist1）时，打印已完成，而csv文件则没有。

以下是我的代码的一部分：

  for url in urllist[10:1000]:  
                # query the website and return the html to the variable 'page'
        try:
            page = urllib2.urlopen(url)
        except urllib2.HTTPError as e:
                if e.getcode() == 404: # eheck the return code
                    continue
        soup = BeautifulSoup(page, 'html.parser')

                    # Take out the <div> of name and get its value
        name_box = soup.find(attrs={'class': 'nb-shares'})
        if name_box is None:
          continue
        share = name_box.text.strip() # strip() is used to remove starting and trailing

        # save the data in tuple
        sharelist.append(url)
        sharelist1.append(share)

    # open a file for writing.
        csv_out = open('mycsv.csv', 'wb')

    # create the csv writer object.
        mywriter = csv.writer(csv_out)

    # writerow - one row of data at a time.
        for row in zip(sharelist, sharelist1):
            mywriter.writerow(row)

    # always make sure that you close the file.
    # otherwise you might find that it is empty.
        csv_out.close()

我不确定我应该在这里分享我的代码的哪一部分。请告诉我它是否还不够！

Answer 1

问题是每次运行循环时都要打开文件。这基本上会覆盖以前的文件。

# open a file for writing.
    csv_out = open('mycsv.csv', 'wb')

# create the csv writer object.
    mywriter = csv.writer(csv_out)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
    csv_out.close()

在循环之前打开文件，或使用append选项打开它。

这是选项一（注意缩进）：

# open a file for writing.
csv_out = open('mycsv.csv', 'wb')

# create the csv writer object.
mywriter = csv.writer(csv_out)
for url in urllist[10:1000]:  
    try:
        page = urllib2.urlopen(url)
    except urllib2.HTTPError as e:
            if e.getcode() == 404: # eheck the return code
                continue
    soup = BeautifulSoup(page, 'html.parser')

    name_box = soup.find(attrs={'class': 'nb-shares'})
    if name_box is None:
      continue
    share = name_box.text.strip()

    # save the data in tuple
    sharelist.append(url)
    sharelist1.append(share)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()

这是选项2：

for url in urllist[10:1000]:  
            # query the website and return the html to the variable 'page'
    try:
        page = urllib2.urlopen(url)
    except urllib2.HTTPError as e:
            if e.getcode() == 404: # eheck the return code
                continue
    soup = BeautifulSoup(page, 'html.parser')

                # Take out the <div> of name and get its value
    name_box = soup.find(attrs={'class': 'nb-shares'})
    if name_box is None:
      continue
    share = name_box.text.strip() # strip() is used to remove starting and trailing

    # save the data in tuple
    sharelist.append(url)
    sharelist1.append(share)

# open a file for writing.
    csv_out = open('mycsv.csv', 'ab')

# create the csv writer object.
    mywriter = csv.writer(csv_out)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
    csv_out.close()

Answer 2

问题已经找到，文件的最佳解决方案是使用with关键字，允许自动关闭文件：

with open('mycsv.csv', 'wb') as csv_out:
    mywriter = csv.writer(csv_out)
    for url in urllist[10:1000]:  

        try:
            page = urllib2.urlopen(url)
        except urllib2.HTTPError as e:
                if e.getcode() == 404:
                    continue
        soup = BeautifulSoup(page, 'html.parser')

        name_box = soup.find(attrs={'class': 'nb-shares'})
        if name_box is None:
          continue
        share = name_box.text.strip()

        # save the data in tuple
        sharelist.append(url)
        sharelist1.append(share)

        for row in zip(sharelist, sharelist1):
            mywriter.writerow(row)

Answer 3

使用上下文管理器打开要写入的文件，这样就不需要显式关闭文件了。

with open('mycsv.csv', 'w') as file_obj:
    mywriter = csv.writer(file_obj)
    for url in urllist[10:1000]:
        try:
            page = urllib2.urlopen(url)
        except urllib2.HTTPError as e:
                if e.getcode() == 404: # check the return code
                    continue
        soup = BeautifulSoup(page, 'html.parser')

        name_box = soup.find(attrs={'class': 'nb-shares'})
        if name_box is None:
            continue
        share = name_box.text.strip()
        # no need to use zip, and append in 2 lists as they're really expensive calls,
        # and by the looks of it, I think it'll create duplicate rows in your file
        mywriter.writerow((url, share))

为什么只有我的列表的一部分写入csv文件？

3 个答案: