Question

我有以下代码。我想做的是截取网站，然后将数据写入excel工作表。我无法从excel文件中读取现有数据。

import xlwt
import xlrd
from xlutils.copy import copy
from datetime import datetime
import urllib.request
from bs4 import BeautifulSoup
import re
import time
import os  
links= open('links.txt', encoding='utf-8')
#excel workbook
if os.path.isfile('./TestSheet.xls'):
    rbook=xlrd.open_workbook('TestSheet.xls',formatting_info=True)
    book=copy(rbook)
else:
    book = xlwt.Workbook()

try:
    book.add_sheet("wayanad")
except:
    print("sheet exists")
    sheet=book.get_sheet(1)

for line in links:
    print("Currently Scanning\n","\n=================\n",line.rstrip())
    url=str(line.rstrip())    
    req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    html = urllib.request.urlopen(req)
    soup = BeautifulSoup(html,"html.parser")
    #print(soup.prettify())
    title=soup.find('h1').get_text()    
    data=[]
    for i in soup.find_all('p'):
       data.append(i.get_text())
    quick_descr=data[1].strip()
    category=data[2].strip()
    tags=data[3].strip()
    owner=data[4].strip()
    website=data[6].strip()
    full_description=data[7]
    address=re.sub('\s+', ' ', soup.find('h3').get_text()).strip()
    city=soup.find(attrs={"itemprop": "addressRegion"}).get_text().strip()
    postcode=soup.find(attrs={"itemprop": "postalCode"}).get_text().strip()
    phone=[]
    result=soup.findAll('h4')
    for h in result:
        if h.has_attr('itemprop'):
            phone.append(re.sub("\D", "", h.get_text()))

    #writing data to excel
    row=sheet.last_used_row
    column_count=sheet.ncols()    
    book.save("Testsheet.xls")
    time.sleep(2)

代码解释

我有一个链接文件，有许多链接逐行。因此，选择一行（URL）并转到该URL并抓取数据。
打开Excel工作簿并切换到用于写入数据的工作表。
将数据附加到Excel工作表.-＆gt;＆gt;

execl表结构的屏幕截图

目前列表为空。但我想从最后一行继续。我没有从细胞中读取数据。有sheet.ncols documentation says可用于计算列数。但它会引发错误

>>>column_count=sheet.ncols()
>>>AttributeError: 'Worksheet' object has no attribute 'ncols'

我想要的是一种计算行和列的方法，并从单元格中读取数据。很多都市很老。现在我使用的是python 3.4。我已经通过这个链接和许多其他链接。但没有运气

Stack overflow

Stackoverdlow

Answer 1

这是你在找什么？通过所有col。？

xl_workbook = xlrd.open_workbook

num_cols = xl_sheet.ncols
for row_idx in range(0, xl_sheet.nrows):

如何使用pythons xlrd模块从Excel工作表中读取

1 个答案: