使用熊猫 read_html() 时遇到问题:ValueError

时间:2021-01-13 12:26:50

标签: python pandas

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

url = "https://finance.naver.com/item/sise_day.nhn?code=068270&page=1"
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
res = requests.get(url, verify=True, headers=headers)


with urlopen(url) as doc:
    html = BeautifulSoup(res.text, 'lxml') 
    pgrr = html.find('td', class_='pgRR') 
    s = str(pgrr.a['href']).split('=')
    last_page = s[-1]


df = pd.DataFrame()
sise_url = 'http://finance.naver.com/item/sise_day.nhn?code=068270'


for page in range(1, int(last_page)+1): 
    page_url = '{}&page={}'.format(sise_url, page)  
    df = df.append(pd.read_html(page_url, encoding='euc-kr', header='0')[0])

df = df.dropna() # 값이 빠진 행을 제거한다.
print(df)

我在 Naver Finance 中抓取每日股票数据时遇到此值错误。 我在获取 url 时没有问题,但如果我使用 read_html(),我会遇到 Value Error:Table not found 行中的 df = df.append(pd.read_html(page_url, encoding='euc-kr', header='0')[0]) 问题。请给一些建议。

1 个答案:

答案 0 :(得分:0)

我不会阅读韩语...但是 pd.read_html() 得到了一个错误页面。通过带有标题的 requests.get() 解决了这个问题。然后将 res.text 传递给 read_html()

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import pandas as pd

url = "https://finance.naver.com/item/sise_day.nhn?code=068270&page=1"
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
res = requests.get(url, verify=True, headers=headers)


with urlopen(url) as doc:
    html = BeautifulSoup(res.text, 'lxml') 
    pgrr = html.find('td', class_='pgRR') 
    s = str(pgrr.a['href']).split('=')
    last_page = s[-1]


df = pd.DataFrame()
sise_url = 'http://finance.naver.com/item/sise_day.nhn?code=068270'

for page in range(1, int(last_page)+1): 
    page_url = '{}&page={}'.format(sise_url, page)  
    res = requests.get(page_url, verify=True, headers=headers)
    df = df.append(pd.read_html(res.text, encoding='euc-kr')[0])