将beautifulsoup数据输出到csv中

时间:2017-03-20 19:15:50

标签: python csv beautifulsoup

我想将我的beautifulsoup数据输出到带有2列的csv中:1。标题,2。描述

所以Title列应该有soup.Title然后Description应该是循环中的print语句,在courselinks中以x开头...

**#This is what I tried:**
with open('newcsv.csv','wb') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerow('Title')

for x in courselinks[0:3]:
    data = requests.get(("http:"+x)
    soup = bs(data.text)
    print soup.title #This I want in the Title column
    for header in soup.find_all(text='Description'):
        nextNode = header.parent
        while True:
            nextNode = nextNode.nextSibling
            if nextNode is None:
                break
            if isinstance(nextNode, Tag):
                print (nextNode.get_text(strip=True).strip().encode('utf-8')) **#This I want in the Description column**
            if isinstance(nextNode, NavigableString):
                print (nextNode.strip().encode('utf-8')) **#This I want in the Description column**
            if isinstance(nextNode, Tag):
                if nextNode.name == "h2":
                    break

这就是我想要的...... enter image description here

1 个答案:

答案 0 :(得分:0)

在向csv写入行时,您只是将一个数组或列表写入该文件。列表或数组中的每个值都是行中的值。如果您希望第一列中数组中的第一项放在第一位,那么就是0索引。该行中的每个后续项都是数组/列表中的后续索引。

for x in courselinks[0:3]:
    data = requests.get(("http:"+x)
    soup = bs(data.text)
    current_row = [soup.title,''] #This I want in the Title column
    for header in soup.find_all(text='Description'):
        current_row[1] = ''
        nextNode = header.parent
        while True:
            nextNode = nextNode.nextSibling
            if nextNode is None:
                writer.writerow(current_row)
                break
            if isinstance(nextNode, Tag):
                current_row[1] += nextNode.get_text(strip=True).strip().encode('utf-8') **#This I want in the Description column**
            if isinstance(nextNode, NavigableString):
                current_row[1] += nextNode.strip().encode('utf-8') **#This I want in the Description column**
            if isinstance(nextNode, Tag):
                if nextNode.name == "h2":
                    writer.writerow(current_row)
                    break