从<ul>类中报废<li>的一部分?

时间:2019-09-12 08:04:53

标签: python-3.x web-scraping stock

试图从“ web”列表中的“ ul class mc-list”中获取每个“ li class mc”的“子弹”。

我是Python的新手,我想对我的股票投资组合进行一些自动检查。

我有一个文件(mystocks.txt),上面有股票行情显示(每行一张票)。

每天我想检查一次SA web是否有任何关于我的股票的新闻。

url = 'https://seekingalpha.com/dividends/dividend-news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
for link in soup.find_all('li'):
...

预期输出为:

  

如果div.bullets包含来自“ mystocks.txt”的代码,则应创建一个名为“ ticket” .txt的文件,并包含“ div.bullets”文本。

2 个答案:

答案 0 :(得分:1)

查看以下实现。我希望它能带你到那里:

import requests
from bs4 import BeautifulSoup

link = "https://seekingalpha.com/dividends/dividend-news"

#following are the pseudo list of tickers you might wanna check against
for ticker in ['NWTUF','BSL','KRC']:
    res = requests.get(link,headers={'User-Agent':'Mozilla/5.0'})
    soup = BeautifulSoup(res.text,"lxml")

    for item in soup.select(".media-body"):
        #if there is no match, get rid of the content
        if ticker not in item.text:continue

        for elem in item.select(".bullets > ul > li, .bullets > ul > li > a"):
            print(elem.text)
        print("***"*20)

答案 1 :(得分:0)

进步很小(正在学习中),逐行读取文件,但是即使票证在页面上也不会打印divi记录:

import requests
from bs4 import BeautifulSoup

link = "https://seekingalpha.com/dividends/dividend-news"
fileHandler = open ("tickers.txt", "r")

with open ("tickers.txt", "r") as fileHandler:
  for ticker in fileHandler:
    print(ticker.strip())
    res = requests.get(link,headers={'User-Agent':'Mozilla/5.0'})
    soup = BeautifulSoup(res.text,"lxml")

    for item in soup.select(".media-body"):
        #if there is no match, get rid of the content
        if ticker not in item.text:continue

        for elem in item.select(".bullets > ul > li, .bullets > ul > li > a"):
            print(elem.text)
        print("***"*20)

# Close Close
fileHandler.close()

输出看起来像(尝试了所有可能的名称): rpi2:〜$ ./divi.py 主要 TJX 纳斯达克:NWFL 奥驰亚 拉尔夫·劳伦