我尝试使用python提取表但是无法删除\ n尽管使用了replace,remove,rsplit,lsplit函数。请帮忙。
以下是我的代码。
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
soup.prettify()
Header = soup.findAll('tr', limit=2)[1].findAll('th')
column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[1].findAll('th')]
print(column_headers)
data_rows = soup.findAll('tr')[2:]
i = range(len(data_rows))
for td in data_rows:
row = td.get_text()
print(row)
我的代码输出如下。只复制了几行。
['Contract', 'Last', 'Change', 'Open', 'High', 'Low', 'Volume', 'Prev. Stl.', 'Time', 'Links']
\n Cash (NGY00)\n 2.890s\n +0.020\n 0.000\n 2.890\n 2.890\n 0\n 2.870\n 05/25/18\n Q / C / O\n
\n Jun \'18 (NGM18)\n 2.946\n +0.007\n 2.946\n 2.968\n 2.908\n 2331\n 2.939\n 19:13\n Q / C / O\n
\n Jul \'18 (NGN18)\n 2.974\n +0.011\n 2.974\n 3.000\n 2.937\n 23859\n 2.963\n 19:37\n Q / C / O\n
\n Aug \'18 (NGQ18)\n 2.989\n +0.006\n 2.983\n 3.016\n 2.957\n 4434\n 2.983\n 18:25\n Q / C / O\n
\n Sep \'18 (NGU18)\n 2.977\n +0.010\n 2.970\n 2.998\n 2.942\n 2313\n 2.967\n 18:07\n Q / C / O\n
\n Oct \'18 (NGV18)\n 2.975\n +0.005\n 2.969\n 2.999\n 2.944\n 2259\n 2.970\n 19:01\n Q / C / O\n
\n Nov \'18 (NGX18)\n 3.013\n +0.005\n 3.007\n 3.034\n 2.983\n 1774\n 3.008\n 19:18\n Q / C / O\n
\n Dec \'18 (NGZ18)\n 3.113\n +0.007\n 3.106\n 3.131\n 3.082\n 1287\n 3.106\n 17:59\n Q / C / O\n
\n Jan \'19 (NGF19)\n 3.198\n +0.011\n 3.177\n 3.212\n 3.165\n 1737\n 3.187\n 17:51\n Q / C / O\n
\n Feb \'19 (NGG19)\n 3.156\n +0.008\n 3.137\n 3.170\n 3.126\n 776\n 3.148\n 17:39\n Q / C / O\n
\n Mar \'19 (NGH19)\n 3.042\n +0.002\n 3.042\n 3.063\n 3.017\n 2891\n 3.040\n 18:27\n Q / C / O\n
\n Apr \'19 (NGJ19)\n 2.672\n +0.018\n 2.662\n 2.676\n 2.648\n 2403\n 2.654\n 11:00\n Q / C / O\n
答案 0 :(得分:0)
我将您的输出保存到res
变量并调用res.replace("\n","")
并且它有效。尝试在每一行上调用它。
答案 1 :(得分:0)
也许这更接近你想要完成的事情:
from bs4 import BeautifulSoup
import requests
url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[1].findAll('th')]
print(column_headers)
data_rows = soup.findAll('tr')[2:]
for td in data_rows:
row = td.get_text().replace('\\n', '').strip()
print(row)