无法将已删除的结果存储在csv文件的第三和第四列中

时间:2017-05-25 12:13:49

标签: python-3.x csv web-scraping

我写了一个脚本,根据姓名和盖子抓取某些商店的地址和电话号码。它的搜索方式是,它分别从csv文件中存储在A列和B列中的Name和Lid。但是,在根据搜索获取结果之后,我希望解析器将结果分别放在第C列和第D列中,如第二张图所示。此时,我卡住了。我不知道如何使用读取或写入方法操纵第三和第四列,以便将数据放在那里。我现在正在尝试这个:

import csv
import requests
from lxml import html
Names, Lids = [], []
with open("mytu.csv", "r") as f:
    reader = csv.DictReader(f)
    for line in reader:
        Names.append(line["Name"])
        Lids.append(line["Lid"])
with open("mytu.csv", "r") as f:
    reader = csv.DictReader(f)
    for entry in reader:
        Page = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
        response = requests.get(Page)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//article[contains(@class,"business-card")]')
        for title in titles:
            Address= title.xpath('.//p[@class="address"]/span/text()')[0]
            Contact = title.xpath('.//p[@class="phone"]/text()')[0]
            print(Address,Contact)

我的csv文件现在如何:

enter image description here

我想要的输出类似于:

enter image description here

1 个答案:

答案 0 :(得分:1)

你可以这样做。创建一个新的输出csv文件,其标题基于输入csv,并添加两列。当您阅读csv行时,它可用作字典,在本例中称为entry。您可以根据您在网络上收集的内容将新值添加到此词典中。然后将每个新创建的行写入文件。

import csv
import requests
from lxml import html
with open("mytu.csv", "r") as f, open('new_mytu.csv', 'w', newline='') as g:
    reader = csv.DictReader(f)
    newfieldnames = reader.fieldnames + ['Address', 'Phone']
    writer = csv.writer = csv.DictWriter(g, fieldnames=newfieldnames)
    writer.writeheader()
    for entry in reader:
        Page = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
        response = requests.get(Page)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//article[contains(@class,"business-card")]')
        #~ for title in titles:
        title = titles[0]
        Address= title.xpath('.//p[@class="address"]/span/text()')[0]
        Contact = title.xpath('.//p[@class="phone"]/text()')[0]
        print(Address,Contact)
        new_row = entry
        new_row['Address'] = Address
        new_row['Phone'] = Contact
        writer.writerow(new_row)