Question

So I am writing some code in Python 2.7 to pull some information from a website, pull the relevant data from that set, then format that data in a way that is more useful. Specifically, I am wanting to take information from a html <pre> tag, put it into a file, turn that information in the file into an array (using numpy), and then do my analysis from that. I am stuck on the "put into a file" part. It seems that when I put it into a file, it is a 1x1 matrix or something and so it won't do what I hope it will. On an attempt previous to the code sample below, the error I got was: IndexError: index 5 is out of bounds for axis 0 with size 0 I had the index for array just to test if it would provide output from what I have so far.

Here is my code so far:

#Pulling data from GFS lamps

from lxml import html
import requests
import numpy as np

ICAO = raw_input("What station would you like GFS lamps data for? ")

page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])

array = np.genfromtxt('ICAO') #puts file into an array

array[5]

You can use KOGD as the ICAO to test this. As is, I get Value Error: Some Errors were detected and it lists Lines 2-23 (Got 26 columns instead of 8). What is the first step that I am doing wrong for what I want to do? Or am I just going about this all wrong?

Answer 1

问题不在于将数据放入文件部分，而是使用genfromtxt将其输出。问题是genfromtxt是一个非常严格的函数，除非你指定很多选项来跳过列和行，否则大多数都需要完整的数据。改为使用这个：

arrays = [np.array(map(str, line.split())) for line in open('ICAO')]

数组变量将包含每行的数组，其中包含由空格分隔的该行中的每个单独元素，例如，如果您的行具有以下数据：

a b cdef 124

此行的数组将为：

['a','b','cdef','124']

数组将包含这样的每一行的数组，可以根据需要进一步处理。所以完整的代码是：

from lxml import html
import requests
import numpy as np

ICAO = raw_input("What station would you like GFS lamps data for? ")

page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])
gfsLamps.close()
array = [np.array(map(str, line.split())) for line in open('ICAO')]
print array

来自<pre> tag

1 个答案: