如何迭代和编码列表文本而不是列表?

时间:2015-10-04 21:24:46

标签: python csv xpath encoding web-scraping

我一直在尝试对列表price['value']进行编码,得到错误AttributeError: 'list' object has no attribute 'encode'。在意识到这个问题之后,我已经尝试了许多不同的方法来对文本进行编码,然后才将它添加到列表中,但没有一个有效。 如何在这种情况下正确使用.encode('utf-8'),以便通过编码文本而不是列表来获取price['value']结果中的非unicode数据?

import mechanize
from lxml import html
import csv
import io
from time import sleep

def save_products (products, writer):

    for product in products:

        writer.writerow([ product["title"][0].encode('utf-8') ])
        for price in product['prices']:
            writer.writerow([ price["value"] ])

f_out = open('pcdResult.csv', 'wb')
writer = csv.writer(f_out)

links = ["http://purechemsdirect.com/ourprices.html/" ]

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]


for link in links:

    print(link)
    r = br.open(link)

    content = r.read()

    products = []        
    tree = html.fromstring(content)        
    product_nodes = tree.xpath('//div[@class="col-md-6 col-lg-6 col-sm-12"]')

    for product_node in product_nodes:

        product = {}
        try:
            product['title'] = product_node.xpath('.//p/strong/text()')

        except:
            product['title'] = ""

        price_nodes = product_node.xpath('.//ul')

        product['prices'] = []
        for price_node in price_nodes:

            price = {}
            try:
                price['value'] = price_node.xpath('.//li/text()')

            except:
                price['value'] = ""

            product['prices'].append(price)
        products.append(product)
    save_products(products, writer)

f_out.close() 

1 个答案:

答案 0 :(得分:0)

也许尝试列表理解,因为price['value']是一个列表。我假设price['value']中的值是字符串,而不是其他列表。如果里面有更多的列表,那么这个答案将不起作用。

def save_products (products, writer):

for product in products:

    writer.writerow([ product["title"][0].encode('utf-8') ])
    for price in product['prices']:
        writer.writerow([x.encode('utf-8') for x in price['value']])
...