So I am writing some code in Python 2.7 to pull some information from a website, pull the relevant data from that set, then format that data in a way that is more useful. Specifically, I am wanting to take information from a html <pre>
tag, put it into a file, turn that information in the file into an array (using numpy), and then do my analysis from that. I am stuck on the "put into a file" part. It seems that when I put it into a file, it is a 1x1 matrix or something and so it won't do what I hope it will. On an attempt previous to the code sample below, the error I got was: IndexError: index 5 is out of bounds for axis 0 with size 0
I had the index for array just to test if it would provide output from what I have so far.
Here is my code so far:
#Pulling data from GFS lamps
from lxml import html
import requests
import numpy as np
ICAO = raw_input("What station would you like GFS lamps data for? ")
page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])
array = np.genfromtxt('ICAO') #puts file into an array
array[5]
You can use KOGD as the ICAO to test this. As is, I get Value Error: Some Errors were detected
and it lists Lines 2-23 (Got 26 columns instead of 8). What is the first step that I am doing wrong for what I want to do? Or am I just going about this all wrong?
答案 0 :(得分:0)
问题不在于将数据放入文件部分,而是使用genfromtxt将其输出。问题是genfromtxt是一个非常严格的函数,除非你指定很多选项来跳过列和行,否则大多数都需要完整的数据。改为使用这个:
arrays = [np.array(map(str, line.split())) for line in open('ICAO')]
数组变量将包含每行的数组,其中包含由空格分隔的该行中的每个单独元素,例如,如果您的行具有以下数据:
a b cdef 124
此行的数组将为:
['a','b','cdef','124']
数组将包含这样的每一行的数组,可以根据需要进一步处理。 所以完整的代码是:
from lxml import html
import requests
import numpy as np
ICAO = raw_input("What station would you like GFS lamps data for? ")
page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])
gfsLamps.close()
array = [np.array(map(str, line.split())) for line in open('ICAO')]
print array