无法使我的刮刀打印并相应地写入结果

时间:2018-04-11 10:23:05

标签: python python-3.x list csv web-scraping

我在python中编写了一个脚本来从网页中抓取一些内容。在解析数据时,刮刀正在做得很好。有两个字段可以抓取dataimport requests, csv from bs4 import BeautifulSoup LINK = 'http://active.boeing.com/doingbiz/d14426/geoprocess.cfm?ProcessCode=000&pageID=m20487&Country=AllLocations&State=' def get_item(url): res = requests.get(url).text soup = BeautifulSoup(res,"lxml") name = [item.find_next_sibling().text for item in soup.select("strong")] table = soup.select('table[cellspacing="1"]')[0] for items in table.select("tr")[1:]: data = [item.get_text(strip=True) for item in items.select("td")] print(name,data) #this is where I need to twitch the code to get them printed like how it should be with open("itemresults.csv","a",newline="") as infile: writer = csv.writer(infile) writer.writerow(name,data) #I can't write them like so but if I try like [name,data] this the results are messy if __name__ == '__main__': get_item(LINK) ,每个字段都包含项目列表。但是,当我打印它时,结果会变得混乱,因为我现在无法正确打印它们。

这是我到目前为止所尝试的:

name

为清晰起见:data变量中的列表应打印一次,但它们的打印速度与As they are big enough to show how the expected result look like, I'm trying with a demo: "1,2,3" are within "name". I wish to get them printed like below: 1 2 3 q w e a s d c x r They are printed like the following instead: 1 2 3 q w e 1 2 3 a s d 1 2 3 c x r 变量中的列表保持同步。

1. I wish to get them printed accordingly and
2. Write in a csv file in the right way

底线是:

HTTP_REFERER

2 个答案:

答案 0 :(得分:2)

如果我理解你的问题, 您希望将名称作为csv文件的标题或标题,并且只应在csv文件和打印中附加一次。

您的代码问题:

你在for循环中有name,每次循环执行时都会打印名称,这就是在csv文件中写入的情况。

固定代码

import requests, csv
from bs4 import BeautifulSoup

LINK = 'http://active.boeing.com/doingbiz/d14426/geoprocess.cfm?ProcessCode=000&pageID=m20487&Country=AllLocations&State='

def get_item(url):
    res = requests.get(url).text
    soup = BeautifulSoup(res,"lxml")
    name = [item.find_next_sibling().text for item in soup.select("strong")]
    spaces = len(" ".join(itm for itm in name))*"  "
    table = soup.select('table[cellspacing="1"]')[0]
    for idx, items in enumerate(table.select("tr")[1:]):
        data = [item.get_text(strip=True) for item in items.select("td")]
        if idx == 0:
            print(name,data)  #this is where I need to twitch the code to get them printed like how it should be
        else:
            print(spaces,data)
        with open("itemresults.csv","a",newline="") as infile:
            writer = csv.writer(infile)
            if idx == 0 :

                writer.writerow([name,data])  #I can't write them like so but if I try like [name,data] this the results are messy
            else:
                writer.writerow([spaces,data])  #I can't write them like so but if I try like [name,data] this the results are messy

if __name__ == '__main__':
    get_item(LINK)

<强>输出

['000', '000', 'Boeing Information Only', 'Boeing Info Only', 'Boeing Information Only'] ['AUSTRIA', '', 'BE10410486', 'MAGNA STEYR']
                                                                                                                                                 ['CHINA', '', 'BE10409781', 'FESHER AVIATION COMPONENTS ZHENJIANG CO LTD']
                                                                                                                                                 ['CHINA', '', 'BE10050454', 'SHENYANG AIRCRAFT CORP']
                                                                                                                                                 ['GERMANY', '', 'BE10364235', 'AERO COATING GMBH']
                                                                                                                                                 ['GERMANY', '', 'BE10022527', 'BFG FEINGUSS NIEDERRHEIN GMBH']
                                                                                                                                                 ['GERMANY', '', 'BE10394502', 'MT AEROSPACE AG']
                                                                                                                                                 ['GERMANY', '', 'BE10341261', 'XPERION GMBH & CO KG']
                                                                                                                                                 ['GERMANY', '', 'BE10023472', 'ZOLLERN ALUMINIUMFEINGUSS SOEST GMBH & CO KG']
                                                                                                                                                 ['INDIA', '', 'BE10387428', 'ADVANCED METALLURGICAL LAB']
                                                                                                                                                 ['MEXICO', '', 'BE10404178', 'MONTERREY AEROSPACE MEXICO']
                                                                                                                                                 ['NETHERLANDS', '', 'BE10334331', 'PM AEROTEC']
                                                                                                                                                 ['UNITED STATES', 'AL', 'BE10039892', 'GENERAL DYNAMICS OTS DRI INC']
                                                                                                                                                 ['UNITED STATES', 'CA', 'BE10059366', 'CANYON COMPOSITES INC']
                                                                                                                                                 ['UNITED STATES', 'CA', 'BE10031203', 'GENERAL VENEER MFG  CO']
                                                                                                                                                 ['UNITED STATES', 'CA', 'BE10038216', 'SAI INDUSTRIES']
                                                                                                                                                 ['UNITED STATES', 'CA', 'BE10277597', 'SANTIER INC']
                                                                                                                                                 ['UNITED STATES', 'CA', 'BE10053288', 'TIODIZE CO INC']
                                                                                                                                                 ['UNITED STATES', 'CA', 'BE10273067', 'VALLEY DESIGN & MFG INC']
                                                                                                                                                 ['UNITED STATES', 'CT', 'BE10054071', 'KAMAN PRECISION PRODUCTS']
                                                                                                                                                 ['UNITED STATES', 'FL', 'BE10361256', 'BAY TECH INDS INC']
                                                                                                                                                 ['UNITED STATES', 'FL', 'BE10067537', 'TRIUMPH AEROSTRUCTURES VOUGHT AIRCRAFT DIVISION']
                                                                                                                                                 ['UNITED STATES', 'FL', 'BE10278251', 'URS LABORATORIES DIVISION']
                                                                                                                                                 ['UNITED STATES', 'GA', 'BE10055356', 'WARNER ROBINS AIR LOGISTICS COMPLEX']
                                                                                                                                                 ['UNITED STATES', 'MD', 'BE10069970', 'ALLIANT TECHSYSTEMS OPERATIONS LLC']
                                                                                                                                                 ['UNITED STATES', 'MO', 'BE10030518', 'ESSEX INDUSTRIES INC']
                                                                                                                                                 ['UNITED STATES', 'OH', 'BE10032670', 'HDI LANDING GEAR USA']
                                                                                                                                                 ['UNITED STATES', 'OH', 'BE10408922', 'ORBIT NDT BEDFORD']
                                                                                                                                                 ['UNITED STATES', 'TX', 'BE10034905', 'AERO COMPONENTS INC']
                                                                                                                                                 ['UNITED STATES', 'UT', 'BE10026661', 'OGDEN AIR LOGISTICS COMPLEX']

此代码将生成所需的csv文件 希望这有帮助

答案 1 :(得分:0)

这是我期望的解决方案:

<%= foo %>

要查看结果,请注意运行它。