TypeError:无法读取“列表”类型的对象

时间:2018-10-03 11:34:39

标签: python pandas beautifulsoup traceback

据我所知,我还没有创建列表,但这给了我

  

TypeError:无法读取“列表”类型的对象。

有什么想法吗?

Python新手,轻松进行。

任何帮助都将受到赞赏。

示例网址:

https://nclbgc.org/search/licenseDetails?licenseNumber=80479

这是完整的追溯:

Traceback (most recent call last):
  File "ncscribble.py", line 26, in <module>
    df = pd.read_html(url)[0].dropna(how='all')
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 987, in read_html
    displayed_only=displayed_only)
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 815, in _parse
    raise_with_traceback(retained)
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\compat\__init__.py", line 404, in raise_with_traceback
    raise exc.with_traceback(traceback)
TypeError: Cannot read object of type 'list'

完整代码:

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import time
import csv
import pandas as pd
import os
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders

def license_exists(soup):
    with open('NC_urls.csv','r') as csvf:
        urls = csv.reader(csvf)
        for url in urls:
            if soup(class_='btn btn-primary"'):
                return False
            else:
                return True


with open('NC_urls.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        df = pd.read_html(url)[0].dropna(how='all')
        df = df.groupby(0)[1].apply(lambda x: ' '.join(x.dropna())).to_frame().rename_axis(None).T
        if not license_exists(soup(page, 'html.parser')):
            # if the license is present we don't want to parse any more urls.

            break


df.to_csv('NC_Licenses_Daily.csv', index=False)

1 个答案:

答案 0 :(得分:2)

遇到类型错误时,通常最好打印该值,如下所示:

    for url in urls:
        print(repr(url))
        df = pd.read_html(url)[0].dropna(how='all')

它将给您:

['https://nclbgc.org/search/licenseDetails?licenseNumber=80479']

这是因为CSV row 本身就是一个列表。您需要获取第一个列表元素,并将其传递给HTML处理器:

    for url in urls:
        df = pd.read_html(url[0])[0].dropna(how='all')

要获取页面数据,可以使用requests

import requests
page = requests.get(url[0]).content