(短) - POST数据应该是字节或可迭代的字节。它不能是str类型

时间:2017-11-30 02:40:41

标签: python web-scraping beautifulsoup

只是尝试从.csv文件中输入链接,然后从每个链接中抓取信息,然后将其写入.csv中的其他列。我好几天都在摸不着头脑。其他人可以看到这里有什么问题吗?错误发生在soup

def scrape_data(csv_file):
writer = csv.writer(csv_file)
reader = csv.reader(csv_file)  

for row in reader:
    if row:

        # THE ERROR HAPPENS AT THE SOUP OBJECT BELOW            

        soup = BeautifulSoup(urllib.request.urlopen(row[0], 'lxml'))
        post_time = soup.find('time', {'class' : 'date timeago'})            
        sqfeet = (sqft.text for sqft in soup.find('span', {'class' : 'shared-line-bubble'})) 
        availability = (soup.find('span', {'class' : 'data-date'}))            
        attribute_group = (ag.text for ag in soup.find('p', {'class' : 'attrgroup'}))
        address = (add.text for add in soup.find('div', {'class' : 'mapaddress'}))            

        for data in zip(post_time, sqfeet, availability, attribute_group, address):
            writer.writerow(row[3])

1 个答案:

答案 0 :(得分:3)

'lxml'部分必须是BeautifulSoup()的参数,但参数为urllib.request.urlopen()