Question

I've been trying to make a generic method so that I'm able to parse in any URL and Classes; They have been successful but for now I'd like to gather data from the text; instead of the title. Etc: "Xiaomi 70Mai Pro"

I've tried referencing from these two sources; but I'm still unsure...

WebScrapper-Sample

Parse HTML Table for URL, Place into List

links = 'SampleLink... with table cell'

def getURLData(url):  # scrap data from Link
try:
    page = requests.get(url)
    content = page.content
    soup = BeautifulSoup(content, "html.parser")
    return soup
except Exception as e:
    print('Error.getURLData:', e)
    return None


inputLink = getURLData(links)


def tableCheck():  # if there's a table cell;
data = []
for table_tag in inputLink.find_all('td', {'class': 'row1'}):
    topic_title = table_tag.find('a', href=True)
    if topic_title:
        datum = {'topic_title': topic_title['title']}
        data.append(datum)
return data


print(tableCheck())

这是输出

  {'topic_title': 'This topic was started: Dec 6 2018, 12:20 PM'}, 
  {'topic_title': 'This topic was started: Nov 19 2018, 10:30 AM'}, 
  {'topic_title': 'This topic was started: Nov 28 2018, 09:16 PM'},
  {'topic_title': 'This topic was started: Oct 3 2018, 11:10 AM'},

这是我要从中提取数据的单元格；我曾尝试使用topic_title = table_tag.find('a', href=True).text，但我真的怀疑是否行得通；我对BeautifulSoup仍然不太了解，我一直在思考如何获取数据。我可以尝试另一个for循环吗？提取其中的数据？

<td class = "row1" valign = "middle" >
   <div >
        <div style = "float:left" >
           <a href = "/topic/4667583" title = "This topic was started: Oct 3 2018, 
            11:10 AM" > 
           Xiaomi 70Mai Pro < /a >
        </div >
        <br >
    </div >
    </td

Answer 1

要添加到现有答案中，您唯一需要做的修改就是将链接文本添加到词典中：

tensor1 = output_of_a_layer_or_input1
tensor2 = output_of_a_layer_or_input2

divResult = Lambda(lambda x: x[0]/x[1])([tensor1,tensor2])

从<a> href link using beautifulsoup

1 个答案: