Question

所以我为Scrapy写了一个小小的片段，用邮政编码在网站上搜索国家，但是通过所有不存在的邮政编码似乎是浪费，所以，首先，这就是我所拥有的......

def start_requests(self):
       for i in xrange(100000):
           yield self.make_requests_from_url("http://www.example.com/zipcode/%05d/search.php" % i)

这个想法很明显，但我在一个列中下载了包含所有美国邮政编码的CSV - 如何在上面的示例中轻松地将其用作列表（或者比列表更有效的方法）？我有熊猫，如果这会让事情变得更容易。

Answer 1

如果我正确理解你，你有一个逗号分隔和格式化的文件，以便在特定列（也许标题为'ZipCodes'）中每行都有一个zipcode。

如果有标题行和不同的列，并且您知道包含zipcodes的列的名称，则可以执行此操作：

def start_requests(self, filename, columnname):
    with open(filename) as file:
        headers = file.readline().strip().split(',')
        for line in file.readlines():
            zipcode = line.strip().split(',')[headers.index(columnname)]
            yield self.make_requests_from_url("http://www.example.com/zipcode/%05d/search.php" % zipcode)

Answer 2

打开文件，读取行，获取邮政编码，收益......

for line in open('zipcodes.csv', 'r').readlines():
    zipcode = line.split(',')[columnNumberOfTheZipCodesStartingFrom0]
    yield self.make_requests_from_url("http://foo.com/blah/%s/search.php" % (zipcode,))

Answer 3

为了完善一系列非常好的建议，这是另一个。这种方法的主要思想是它不需要像pandas这样的特殊库，但不仅仅是读取普通文件内容，在这种情况下，就CSV标记而言，你必须重新发明轮子（不是最难的）事情，但为什么要打扰？）。如果您的csv文件足够简单，可能更容易读出文件内容，如dg99所示

使用python's built-in csv library!

ziplist = []
import csv
with open('zipcodes.csv', 'rb') as csvfile:
    zipreader = csv.reader(csvfile)
    for row in zipreader:
        ziplist.append(row[i])

注意：

我有row[i]其中i是csv文件中zipcodes的列索引。如果文件列出了zip + 4代码，则可以使用row[i][:5]。有趣的是，如果您不知道zipcodes将包含哪个列号，但您知道列标题（字段名称），则可以使用

zipreader = csv.DictReader(csvfile) for zipDict in zipreader: ziplist.append(row['Zip Code Column Name Here'])
根据this post，从列表中获取信息与元组一样有效，所以这似乎是要走的路。

Answer 4

所以你想在csv中读到一个列表......好吧：我认为这应该很简单：

import pandas
colname = ['zip code','city']
zipdata = pandas.read_csv('uszipcodes.csv', names=colname)

我希望我理解你的权利！

Answer 5

也许是这样的？

#!/usr/local/cpython-3.3/bin/python

import csv
import pprint

def gen_zipcodes(file_):
    reader = csv.reader(file_, delimiter='|', quotechar='"')
    for row in reader:
        yield row[0]

def main():
    with open('zipcodes_2006.txt', 'r') as file_:
        zipcodes = list(gen_zipcodes(file_))
    pprint.pprint(zipcodes[:10])

main()

从Python中的CSV列列出

5 个答案: