无法在for循环外调用网络抓取的字符串

时间:2020-06-14 00:37:25

标签: python for-loop web-scraping beautifulsoup output

    from bs4 import BeautifulSoup as soup
    import pandas as pd
    import requests   


    url = 'http://ips.alliance-pipeline.com/Ips/MainPage.aspx?siteCd=ALLUSA-IPS&contentSysCd=USA-OP-AVAIL-BY-DAY&tvPath=55/112/56'
    response = requets.get(url)
    html = soup(response.context, 'html.parser')

    loc = html.find_all('td', class_= 'ig162a1706')
    tsq = html.find_all('td', class_ = 'ig162a170e')

    for r in html.find_all('td', class_ = 'ig162a1706'):
        loc = r.text
        print(loc)

    for a in html.find_all('td', class_ = 'ig162a170e'):
        tsq = a.text
        print(tsq)

输出:

ALLIANCE/ANR
ALLIANCE/ROSHOLT
AUX SABLE
BANTRY
BORDER USA
GUARDIAN
HANKINSON
HORIZON
LYLE
MIDWESTERN GAS TRANSMISSION
MILNOR
NATURAL GAS PIPELINE COMPANY OF AMERICA
NICOR/MORRIS
PEOPLES/ELWOOD
TIOGA
VECTOR PIPELINE
729,192
2,600
245,000
141,021
1,402,129
2,158
9,030
0
8,000
0
350
114,618
236,385
34,426
111,235
152,612

错误:

    print(loc)

输出:

'VECTOR PIPELINE' 

大家好,基本上,每当我在for循环之外执行print(loc)时,它只打印'vector pipeline',而我对为什么会这样感到困惑。即使我在外部执行命令而不仅仅是“矢量管道”,它也不打印整个循环吗?不知道我在做什么错。输出是一个字符串。

1 个答案:

答案 0 :(得分:0)

您可以结合使用内置方法zip()和有关名称和TSQ的“联系”信息。

例如:

import requests
from bs4 import BeautifulSoup


url = 'http://ips.alliance-pipeline.com/Ips/MainPage.aspx?siteCd=ALLUSA-IPS&contentSysCd=USA-OP-AVAIL-BY-DAY&tvPath=55/112/56'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

col1 = soup.select('td.ig162a1706')
col2 = soup.select('td.ig162a170e')

all_data = {}
for name, tsq in zip(col1, col2):
    all_data[name.text] = tsq.text

# pretty print to screen:
from pprint import pprint
pprint(all_data)

打印:

{'ALLIANCE/ANR': '719,055',
 'ALLIANCE/ROSHOLT': '2,600',
 'AUX SABLE': '245,000',
 'BANTRY': '141,021',
 'BORDER USA': '1,402,129',
 'GUARDIAN': '0',
 'HANKINSON': '9,030',
 'HORIZON': '0',
 'LYLE': '8,000',
 'MIDWESTERN GAS TRANSMISSION': '0',
 'MILNOR': '350',
 'NATURAL GAS PIPELINE COMPANY OF AMERICA': '114,618',
 'NICOR/MORRIS': '236,949',
 'PEOPLES/ELWOOD': '34,426',
 'TIOGA': '111,235',
 'VECTOR PIPELINE': '173,570'}

然后,您可以分别打印有关位置的信息,例如:

# print information only for 'TIOGA'
print(all_data['TIOGA'])

打印:

111,235