我正在尝试抓取一些网页内容,但我很难对输出进行格式化。我的代码生成一个列表,然后迭代该列表以向其添加更多信息。我得到了我需要的所有数据,但是当我尝试将其保存为CSV时,我每行都会获得多个列表。我不知道如何做到这一点。
这是我的代码:
def getPeople(company, url, filename):
persList = []
category = os.path.splitext(filename)[0]
code = urllib.urlopen(url)
html = code.read()
unic = unicode(html, errors='ignore')
tree = etree.parse(StringIO(unic), parser)
personNames = tree.xpath('//a[@class="person"]/text()')
personUrls = tree.xpath('//a[@class="person"]/@href')
for i, j in zip(personNames, personUrls):
personInfo = (company, category, i, j)
internal = list(personInfo)
persList.append(internal)
result = list(persList)
return result
def tasker(filename):
peopleList = []
companyNames = getCompanies(filename, '//a[@class="company"]/text()')
companyUrls = getCompanies(filename, '//a[@class="company"]/@href')
for i, j in zip(companyNames, companyUrls):
peopleLinks = getPeople(i, j, filename)
internal = list(peopleLinks)
peopleList.append(internal)
output = csv.writer(open("test.csv", "wb"))
for row in itertools.izip_longest(*peopleList):
output.writerow(row)
return peopleList
以下是输出示例:
[[['3M', 'USA', 'Rod Thomas', 'http://site.com/ron-thomas'], ['HP', 'USA', 'Todd Gack', 'http://site.com/todd-gack'], ['Dell', 'USA', 'Phil Watters', 'http://site.com/philwatt-1'], ['IBM', 'USA', 'Mary Sweeney', 'http://site.com/ms2105']], [['3M', 'USA', 'Tom Hill', 'http://site.com/tomhill'], None, ['Dell', 'USA', 'Howard Duck', 'http://site.com/howard-duck'], None], [['3M', 'USA', 'Neil Rallis', 'http://site.com/nrallis-4'], None, None, None]]
这使得难以阅读的丑陋CSV文件成为可能。有人能指出我正确的方向吗?
编辑: 这就是我想要输出的样子。
[['3M', 'USA', 'Rod Thomas', 'http://site.com/ron-thomas'], ['HP', 'USA', 'Todd Gack', 'http://site.com/todd-gack'], ['Dell', 'USA', 'Phil Watters', 'http://site.com/philwatt-1'], ['IBM', 'USA', 'Mary Sweeney', 'http://site.com/ms2105'], ['3M', 'USA', 'Tom Hill', 'http://site.com/tomhill'], ['Dell', 'USA', 'Howard Duck', 'http://site.com/howard-duck'], ['3M', 'USA', 'Neil Rallis', 'http://site.com/nrallis-4']]
答案 0 :(得分:4)
在你的行中:
peopleList.append(internal)
您要将一个列表附加到另一个列表中。这使得内部列表成为peopleList的成员。
相反,您想要扩展peopleList。这就是你如何组合两个列表。
所以它会是:
peopleList.extend(internal)