我想将我的beautifulsoup数据输出到带有2列的csv中:1。标题,2。描述
所以Title列应该有soup.Title然后Description应该是循环中的print语句,在courselinks中以x开头...
**#This is what I tried:**
with open('newcsv.csv','wb') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerow('Title')
for x in courselinks[0:3]:
data = requests.get(("http:"+x)
soup = bs(data.text)
print soup.title #This I want in the Title column
for header in soup.find_all(text='Description'):
nextNode = header.parent
while True:
nextNode = nextNode.nextSibling
if nextNode is None:
break
if isinstance(nextNode, Tag):
print (nextNode.get_text(strip=True).strip().encode('utf-8')) **#This I want in the Description column**
if isinstance(nextNode, NavigableString):
print (nextNode.strip().encode('utf-8')) **#This I want in the Description column**
if isinstance(nextNode, Tag):
if nextNode.name == "h2":
break
答案 0 :(得分:0)
在向csv写入行时,您只是将一个数组或列表写入该文件。列表或数组中的每个值都是行中的值。如果您希望第一列中数组中的第一项放在第一位,那么就是0索引。该行中的每个后续项都是数组/列表中的后续索引。
for x in courselinks[0:3]:
data = requests.get(("http:"+x)
soup = bs(data.text)
current_row = [soup.title,''] #This I want in the Title column
for header in soup.find_all(text='Description'):
current_row[1] = ''
nextNode = header.parent
while True:
nextNode = nextNode.nextSibling
if nextNode is None:
writer.writerow(current_row)
break
if isinstance(nextNode, Tag):
current_row[1] += nextNode.get_text(strip=True).strip().encode('utf-8') **#This I want in the Description column**
if isinstance(nextNode, NavigableString):
current_row[1] += nextNode.strip().encode('utf-8') **#This I want in the Description column**
if isinstance(nextNode, Tag):
if nextNode.name == "h2":
writer.writerow(current_row)
break