Question

所以我在学校并完成了我的Python入门课程，我决定使用我的技能来尝试制作一些有用的东西，所以我想制作一个脚本来刮取Steam市场网页并在通知我时项目以所需价格或低于所需价格列出。我有点卡住了，希望我能得到任何帮助我的提示。我正在使用urllib2和BeautifulSoup

from bs4 import BeautifulSoup
from urllib2 import urlopen
import time



item = str(raw_input('Please enter the item you are looking for(Exact URL): '))
price = str(raw_input('Please enter the price you want to buy the item at: '))

print('Searching for item at that price....\n' + item)

market = urlopen(item)

def getPrices(market,desiredPrice):
    while True:
        soup = BeautifulSoup(market)
        prices = soup.findAll('span',{'class':'market_listing_price market_listing_price_with_fee'})

        """
        So now my logic assumed I should do something like;

        if desiredPrice in prices:
            print('found item at the desired price!')
            return link_to_item

        """

        print('Searching...')
        time.sleep(20)


getPrices(market, price)

为了测试我正在使用这个蒸汽市场链接：https://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29

包含首页上每件商品价格的跨度为class ='market_listing_price market_listing_price_with_fee'

底线问题：
我似乎无法从每个span标签中获取数据;我想把价格作为花车抓住并将它们放入一个列表然后我就可以对其进行排序;然后我就可以将它们与期望的价格进行比较，并找到低于期望价格的任何东西。

Answer 1

这些跨度中有很多文字。如果你过滤掉它应该没问题。

>>> [i.text.strip() for i in prices]
[u'Sold!', u'\xa5 33.69', u'\xa5 33.69', u'Sold!', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69']

那里有一个日元符号，除非你需要货币信息，否则你也可以把它拿走。

只获取我要做的数字：

prices = [i.text.strip() for i in prices]
prices =  [float(k) for k in [''.join([j for j in i if j in '0123456789.']) for i in prices] if k]
if min(prices)< desiredPrice:

请记住，您首先需要float(desiredPrice)，并确保您正在循环中阅读网络数据。目前，您每20秒检查一次完全相同的数据！

Python - 我正在尝试为蒸汽市场创建一个简单的Web scraper

1 个答案: