我在python中编写了一个脚本来从网页中抓取一些内容。在解析数据时,刮刀正在做得很好。有两个字段可以抓取data
和import requests, csv
from bs4 import BeautifulSoup
LINK = 'http://active.boeing.com/doingbiz/d14426/geoprocess.cfm?ProcessCode=000&pageID=m20487&Country=AllLocations&State='
def get_item(url):
res = requests.get(url).text
soup = BeautifulSoup(res,"lxml")
name = [item.find_next_sibling().text for item in soup.select("strong")]
table = soup.select('table[cellspacing="1"]')[0]
for items in table.select("tr")[1:]:
data = [item.get_text(strip=True) for item in items.select("td")]
print(name,data) #this is where I need to twitch the code to get them printed like how it should be
with open("itemresults.csv","a",newline="") as infile:
writer = csv.writer(infile)
writer.writerow(name,data) #I can't write them like so but if I try like [name,data] this the results are messy
if __name__ == '__main__':
get_item(LINK)
,每个字段都包含项目列表。但是,当我打印它时,结果会变得混乱,因为我现在无法正确打印它们。
这是我到目前为止所尝试的:
name
为清晰起见:data
变量中的列表应打印一次,但它们的打印速度与As they are big enough to show how the expected result look like, I'm trying with a demo:
"1,2,3" are within "name".
I wish to get them printed like below:
1 2 3 q w e
a s d
c x r
They are printed like the following instead:
1 2 3 q w e
1 2 3 a s d
1 2 3 c x r
变量中的列表保持同步。
1. I wish to get them printed accordingly and
2. Write in a csv file in the right way
底线是:
HTTP_REFERER
答案 0 :(得分:2)
如果我理解你的问题, 您希望将名称作为csv文件的标题或标题,并且只应在csv文件和打印中附加一次。
您的代码问题:
你在for循环中有name
,每次循环执行时都会打印名称,这就是在csv文件中写入的情况。
固定代码
import requests, csv
from bs4 import BeautifulSoup
LINK = 'http://active.boeing.com/doingbiz/d14426/geoprocess.cfm?ProcessCode=000&pageID=m20487&Country=AllLocations&State='
def get_item(url):
res = requests.get(url).text
soup = BeautifulSoup(res,"lxml")
name = [item.find_next_sibling().text for item in soup.select("strong")]
spaces = len(" ".join(itm for itm in name))*" "
table = soup.select('table[cellspacing="1"]')[0]
for idx, items in enumerate(table.select("tr")[1:]):
data = [item.get_text(strip=True) for item in items.select("td")]
if idx == 0:
print(name,data) #this is where I need to twitch the code to get them printed like how it should be
else:
print(spaces,data)
with open("itemresults.csv","a",newline="") as infile:
writer = csv.writer(infile)
if idx == 0 :
writer.writerow([name,data]) #I can't write them like so but if I try like [name,data] this the results are messy
else:
writer.writerow([spaces,data]) #I can't write them like so but if I try like [name,data] this the results are messy
if __name__ == '__main__':
get_item(LINK)
<强>输出强>
['000', '000', 'Boeing Information Only', 'Boeing Info Only', 'Boeing Information Only'] ['AUSTRIA', '', 'BE10410486', 'MAGNA STEYR']
['CHINA', '', 'BE10409781', 'FESHER AVIATION COMPONENTS ZHENJIANG CO LTD']
['CHINA', '', 'BE10050454', 'SHENYANG AIRCRAFT CORP']
['GERMANY', '', 'BE10364235', 'AERO COATING GMBH']
['GERMANY', '', 'BE10022527', 'BFG FEINGUSS NIEDERRHEIN GMBH']
['GERMANY', '', 'BE10394502', 'MT AEROSPACE AG']
['GERMANY', '', 'BE10341261', 'XPERION GMBH & CO KG']
['GERMANY', '', 'BE10023472', 'ZOLLERN ALUMINIUMFEINGUSS SOEST GMBH & CO KG']
['INDIA', '', 'BE10387428', 'ADVANCED METALLURGICAL LAB']
['MEXICO', '', 'BE10404178', 'MONTERREY AEROSPACE MEXICO']
['NETHERLANDS', '', 'BE10334331', 'PM AEROTEC']
['UNITED STATES', 'AL', 'BE10039892', 'GENERAL DYNAMICS OTS DRI INC']
['UNITED STATES', 'CA', 'BE10059366', 'CANYON COMPOSITES INC']
['UNITED STATES', 'CA', 'BE10031203', 'GENERAL VENEER MFG CO']
['UNITED STATES', 'CA', 'BE10038216', 'SAI INDUSTRIES']
['UNITED STATES', 'CA', 'BE10277597', 'SANTIER INC']
['UNITED STATES', 'CA', 'BE10053288', 'TIODIZE CO INC']
['UNITED STATES', 'CA', 'BE10273067', 'VALLEY DESIGN & MFG INC']
['UNITED STATES', 'CT', 'BE10054071', 'KAMAN PRECISION PRODUCTS']
['UNITED STATES', 'FL', 'BE10361256', 'BAY TECH INDS INC']
['UNITED STATES', 'FL', 'BE10067537', 'TRIUMPH AEROSTRUCTURES VOUGHT AIRCRAFT DIVISION']
['UNITED STATES', 'FL', 'BE10278251', 'URS LABORATORIES DIVISION']
['UNITED STATES', 'GA', 'BE10055356', 'WARNER ROBINS AIR LOGISTICS COMPLEX']
['UNITED STATES', 'MD', 'BE10069970', 'ALLIANT TECHSYSTEMS OPERATIONS LLC']
['UNITED STATES', 'MO', 'BE10030518', 'ESSEX INDUSTRIES INC']
['UNITED STATES', 'OH', 'BE10032670', 'HDI LANDING GEAR USA']
['UNITED STATES', 'OH', 'BE10408922', 'ORBIT NDT BEDFORD']
['UNITED STATES', 'TX', 'BE10034905', 'AERO COMPONENTS INC']
['UNITED STATES', 'UT', 'BE10026661', 'OGDEN AIR LOGISTICS COMPLEX']
此代码将生成所需的csv文件 希望这有帮助
答案 1 :(得分:0)
这是我期望的解决方案:
<%= foo %>
要查看结果,请注意运行它。