我在Python中使用beautifulsoup4编写了一个刮刀程序,它遍历多个加密货币值页面并返回开始,最高和结束值。该问题的抓取部分工作正常,但无法将所有货币保存到我的列表中,只有最后一个被添加到列表中。
有谁可以帮我解决如何保存所有这些问题?我已经完成了数小时的搜索工作,似乎无法找到相关的答案。代码如下:
no_space = name_15.str.replace('\s+', '-')
#lists out the pages to scrape
for n in no_space:
page = 'https://coinmarketcap.com/currencies/' + n + '/historical-data/'
http = lib.PoolManager()
response = http.request('GET', page)
soup = BeautifulSoup(response.data, "lxml")
main_table = soup.find('tbody')
date=[]
open_p=[]
high_p=[]
low_p=[]
close_p=[]
table = []
for row in main_table.find_all('td'):
table_pull = row.find_all_previous('td') #other find methods aren't returning what I need, but this works just fine
table = [p.text.strip() for p in table_pull]
date = table[208:1:-7]
open_p = table[207:1:-7]
high_p = table[206:1:-7]
low_p = table[205:1:-7]
close_p = table[204:0:-7]
df=pd.DataFrame(date,columns=['Date'])
df['Open']=list(map(float,open_p))
df['High']=list(map(float,high_p))
df['Low']=list(map(float,low_p))
df['Close']=list(map(float,close_p))
print(df)
答案 0 :(得分:0)
简而言之,看起来您正在访问所有'td'元素,然后尝试访问该列表的先前元素,这是不必要的。另外,正如@hoefling指出的那样,你在循环中不断覆盖你的变量,这就是为什么你只返回列表中的最后一个元素的原因(换句话说,只有循环的最后一次迭代设置值该变量,所有以前的变量都被覆盖了)。道歉,由于我机器上的防火墙,我目前无法测试。请尝试以下方法:
no_space = name_15.str.replace('\s+', '-')
#lists out the pages to scrape
for n in no_space:
page = 'https://coinmarketcap.com/currencies/' + n + '/historical-data/'
http = lib.PoolManager()
response = http.request('GET', page)
soup = BeautifulSoup(response.data, "lxml")
main_table = soup.find('tbody')
table = [p.text.strip() for p in main_table.find_all('td')]
#You will need to re-think these indices here to get the info you want
date = table[208:1:-7]
open_p = table[207:1:-7]
high_p = table[206:1:-7]
low_p = table[205:1:-7]
close_p = table[204:0:-7]
df=pd.DataFrame(date,columns=['Date'])
df['Open']=list(map(float,open_p))
df['High']=list(map(float,high_p))
df['Low']=list(map(float,low_p))
df['Close']=list(map(float,close_p))
print(df)