将Python抓取结果导出到CSV

时间:2019-02-06 22:59:16

标签: python csv export-to-csv

下面的代码正在产生“ resultStats” ID的值,我想将其保存在CSV文件中。是否有任何灵巧的方法在CSV的A列中包含“ desired_google_queries”(即搜索字词),在B列中具有“ resultStats”值?

我看到有很多与此主题相关的主题,但我没有阅读过针对特定情况的解决方案。

from bs4 import BeautifulSoup
import urllib.request
import csv

    desired_google_queries = ['Elon Musk' , 'Tesla', 'Microsoft']

for query in desired_google_queries:

    url = 'http://google.com/search?q=' + query

    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
    response = urllib.request.urlopen( req )
    html = response.read()

    soup = BeautifulSoup(html, 'html.parser')

    resultStats = soup.find(id="resultStats").string
    print(resultStats)

3 个答案:

答案 0 :(得分:2)

我可以自由地重写它,以使用Requests库而不是urllib,但这表明了如何进行CSV编写,这是我认为您更感兴趣的:

figure(width = 2000, height = NULL)

答案 1 :(得分:1)

您可以一次将所有结果存储在pandas数据框中,而不用逐行编写。参见下面的代码

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

data_dict = {'desired_google_queries': [],
             'resultStats': []}

desired_google_queries = ['Elon Musk' , 'Tesla', 'Microsoft']

for query in desired_google_queries:

    url = 'http://google.com/search?q=' + query

    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
    response = urllib.request.urlopen( req )
    html = response.read()

    soup = BeautifulSoup(html, 'html.parser')

    resultStats = soup.find(id="resultStats").string

    data_dict['desired_google_queries'].append(query)
    data_dict['resultStats'].append(resultStats)

df = pd.DataFrame(data=data_dict)
df.to_csv(path_or_buf='path/where/you/want/to/save/thisfile.csv', index=None)

答案 2 :(得分:0)

很遗憾,原来的答案已被删除-请在下面的代码中查找对此情况感兴趣的其他所有人。首先要感谢发布解决方案的用户:

with open('eggs.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                        quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['query', 'resultStats'])
    for query in desired_google_queries:
        ...
        spamwriter.writerow([query, resultStats])