从已删除的表中删除\ n

时间:2018-05-29 11:25:49

标签: python

我尝试使用python提取表但是无法删除\ n尽管使用了replace,remove,rsplit,lsplit函数。请帮忙。

以下是我的代码。

from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"

res = requests.get(url)

soup = BeautifulSoup(res.text, 'lxml')

soup.prettify()

Header = soup.findAll('tr', limit=2)[1].findAll('th')

column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[1].findAll('th')]

print(column_headers)

data_rows = soup.findAll('tr')[2:]

i = range(len(data_rows))

for td in data_rows:
    row = td.get_text()
    print(row)

我的代码输出如下。只复制了几行。

['Contract', 'Last', 'Change', 'Open', 'High', 'Low', 'Volume', 'Prev. Stl.', 'Time', 'Links']
\n    Cash (NGY00)\n    2.890s\n    +0.020\n    0.000\n    2.890\n    2.890\n    0\n    2.870\n    05/25/18\n    Q / C / O\n  
\n    Jun \'18 (NGM18)\n    2.946\n    +0.007\n    2.946\n    2.968\n    2.908\n    2331\n    2.939\n    19:13\n    Q / C / O\n  
\n    Jul \'18 (NGN18)\n    2.974\n    +0.011\n    2.974\n    3.000\n    2.937\n    23859\n    2.963\n    19:37\n    Q / C / O\n  
\n    Aug \'18 (NGQ18)\n    2.989\n    +0.006\n    2.983\n    3.016\n    2.957\n    4434\n    2.983\n    18:25\n    Q / C / O\n  
\n    Sep \'18 (NGU18)\n    2.977\n    +0.010\n    2.970\n    2.998\n    2.942\n    2313\n    2.967\n    18:07\n    Q / C / O\n  
\n    Oct \'18 (NGV18)\n    2.975\n    +0.005\n    2.969\n    2.999\n    2.944\n    2259\n    2.970\n    19:01\n    Q / C / O\n  
\n    Nov \'18 (NGX18)\n    3.013\n    +0.005\n    3.007\n    3.034\n    2.983\n    1774\n    3.008\n    19:18\n    Q / C / O\n  
\n    Dec \'18 (NGZ18)\n    3.113\n    +0.007\n    3.106\n    3.131\n    3.082\n    1287\n    3.106\n    17:59\n    Q / C / O\n  
\n    Jan \'19 (NGF19)\n    3.198\n    +0.011\n    3.177\n    3.212\n    3.165\n    1737\n    3.187\n    17:51\n    Q / C / O\n  
\n    Feb \'19 (NGG19)\n    3.156\n    +0.008\n    3.137\n    3.170\n    3.126\n    776\n    3.148\n    17:39\n    Q / C / O\n  
\n    Mar \'19 (NGH19)\n    3.042\n    +0.002\n    3.042\n    3.063\n    3.017\n    2891\n    3.040\n    18:27\n    Q / C / O\n  
\n    Apr \'19 (NGJ19)\n    2.672\n    +0.018\n    2.662\n    2.676\n    2.648\n    2403\n    2.654\n    11:00\n    Q / C / O\n 

2 个答案:

答案 0 :(得分:0)

我将您的输出保存到res变量并调用res.replace("\n","")并且它有效。尝试在每一行上调用它。

答案 1 :(得分:0)

也许这更接近你想要完成的事情:

from bs4 import BeautifulSoup
import requests

url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')

column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[1].findAll('th')]
print(column_headers)

data_rows = soup.findAll('tr')[2:]
for td in data_rows:
    row = td.get_text().replace('\\n', '').strip()
    print(row)