Question

我是Python的新手，正在上课。我认为我已经接近完成要求，但仍坚持将数据保存到csv文件中。该文件始终为空。我在代码的写入部分尝试了多种方法，但仍然无法弄清楚。任何指导将不胜感激。

import requests
from bs4 import BeautifulSoup
import csv
import os.path

url = "https://www.census.gov/programs-surveys/popest.html"
response = requests.get(url)
# parse html
page = str(BeautifulSoup(response.content))

def getURL(page):
    start_link = page.find("a href")
    if start_link == -1:
        return None, 0
    start_quote = page.find('"', start_link)
    end_quote = page.find('"', start_quote + 1)
    url = page[start_quote + 1: end_quote]
    return url, end_quote

while True:
    url, n = getURL(page)
    page = page[n:]
    if url:
        print url
    else:
        break

userhome = os.path.expanduser('~')
myfile = os.path.join(userhome, 'Desktop', 'data.csv')

f=open(myfile,"w")
f.write(getURL)
f.close()

Answer 1

我已经看到的一件事是您没有打电话给getUrl

f.write(getUrl(page)[0])

此外，当数据丢失时，您还要稍后尝试写入文件。

while循环在您要抓取的页面中找到所有URL，因此您需要在该循环中写入文件。

userhome = os.path.expanduser('~')
myfile = os.path.join(userhome, 'Desktop', 'data.csv')

with open(myfile, "w") as f:
    while True:
        url, n = getURL(page)
        page = page[n:]
        if url:
            print(url)
            f.write("%s\n" % getURL(page)[0])
        else:
            break

Answer 2

f.write()期望一个str，并且您赋予它一个功能（getURL）。您必须给它一个字符串。 f.write(getUrl()[0])应该可以满足您的需求。

Answer 3

您使用的是Python 2还是3？我注意到您正在调用不带括号的打印功能。

您的主要问题是，您仅将函数（getURL）调用到f.write，您需要传递要保存的实际值。就您而言，您要打印的“ url”变量是我假设要保存的内容。

尽管我不确定这是否是您想要的格式，但通过进行以下更改，在我的data.csv文件中，我在每个行上都有每个URL：

将以下代码移至while循环的之前行：

userhome = os.path.expanduser（'〜'）

myfile = os.path.join（userhome，'Desktop'，'data.csv'）

f = open（myfile，“ w”）

在while循环中，在打印语句之前或之后添加此代码：

f.write（url +“ \ n”）
在脚本末尾保留f.close（）

在Mac上将python web抓取的数据写入.csv文件

3 个答案: