Question

对于一个项目，我正在从 futbin 播放器中抓取数据，我想将抓取的数据添加到 dict 或 pandas 数据框。我被困了几个小时，如果可能的话，我需要一些帮助。我将把我的代码放在下面。这段代码只打印出数据，我不知道该怎么做。

代码：

from requests_html import HTMLSession
import requests
from bs4 import BeautifulSoup
import pandas as pd
urls = ['https://www.futbin.com/21/player/87/pele', 'https://www.futbin.com/21/player/27751/robert-lewandowski']

for url in urls:
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    info = soup.find('div', id='info_content')
    rows = info.find_all('td')
    for info in rows:
        print(info.text.strip())

Answer 1

我会用 open() 和 write() 来做

 file = open ("filename.txt", "w")

w 指定以下内容：

"w" - Write - Opens a file for writing, creates the file if it does not exist

然后：

 file.write (text_to_save)

一定要包含 os.path！

import os.path

Answer 2

您已经为确定所需表格所做的工作很好。

使用 read_html() 转换为数据帧
将其转换为列而不是键值对的基本转换
在list理解中获取所有想要的足球运动员的详细信息

import requests
from bs4 import BeautifulSoup
import pandas as pd
urls = ['https://www.futbin.com/21/player/87/pele', 'https://www.futbin.com/21/player/27751/robert-lewandowski']

def myhtml(url):
    # use BS4 to get table that has required data
    html = str(BeautifulSoup(requests.get(url).content, 'html.parser').find('div', id='info_content').find("table"))
    # read_html() returns a list, take first one,  first column are attribute name, transpose to build DF
    return pd.read_html(html)[0].set_index(0).T

df = pd.concat([myhtml(u) for u in urls])

<头>

	姓名	俱乐部	民族	联盟	技能	弱脚	国际。代表	脚	高度	重量	修订版	定义。 WR	Att. WR	添加于	原点	R.Face	B.Type	DOB	罗伯特·莱万多夫斯基 FIFA 21 生涯模式	年龄
1	埃德森·阿兰特斯·纳西门托	FUT 21 图标	巴西	图标	5	4	5	右	173cm	5'8"	70	图标	医学	高	2020-09-10	Prime	nan	独特	23-10-1940	nan
1	罗伯特·莱万多夫斯基	拜仁足球俱乐部	波兰	德甲	4	4	4	右	184cm	6'0"	80	TOTY	医学	高	2021-01-22	TOTY	nan	独特	nan	罗伯特·莱万多夫斯基 FIFA 21 生涯模式

如何遍历抓取的项目并将它们添加到字典或 Pandas 数据框中？

2 个答案: