我正在使用可用的xml文件here。
我想解析并加载LON, LAT, PGA, PGV, MMI, PSA03, PSA10, PSA30, STDPGA, URAT and SVEL
作为CSV文件的标题。
grid_data
元素以空格分隔符方式包含所有这些标头的所有值。
我正在寻找下面的csv file output
:
LON LAT PGA PGV MMI PSA03 PSA10 PSA30 STDPGA URAT SVEL
-99.6833 38.2891 0.04 0.04 2.04 0.09 0.02 0 0.65 1 363.294
-99.6666 38.2891 0.04 0.04 2.06 0.09 0.02 0 0.65 1 342.531
-99.6500 38.2891 0.04 0.04 2.11 0.1 0.02 0 0.65 1 303.783
-99.6333 38.2891 0.04 0.04 2.08 0.09 0.02 0 0.65 1 334.629
-99.6166 38.2891 0.04 0.05 2.15 0.09 0.02 0 0.65 1 279.535
-99.6000 38.2891 0.04 0.04 2.08 0.09 0.02 0 0.65 1 326.391
-99.5833 38.2891 0.04 0.04 2.02 0.08 0.02 0 0.65 1 390.897
-99.5666 38.2891 0.04 0.04 2.08 0.09 0.02 0 0.65 1 346.033
稍后,我会使用pandas for python来查找最大PGV值并进行GIS分析。
到目前为止,这是我的代码:
import sys
import traceback
from xml.dom import minidom
import warnings
warnings.filterwarnings("ignore")
try:
print "*"*20 + " The Beginning " + "*"*20
xml_file_location = r"C:\Users\*****\Downloads\Grids\us2000a3y4_grid.xml"
xmldoc = minidom.parse(xml_file_location)
itemlist = xmldoc.getElementsByTagName('grid_field')
for item in itemlist:
print (item.attributes['name'].value)
Catch all exception and print to the screen
except:
e = sys.exc_info()[0]
print( "Error: %s\n\n" % e )
#Closing script
finally:
print "*"*20 + " The End " + "*"*20
答案 0 :(得分:1)
考虑使用内置etree
解析 grid_data 节点,并使用pandas.read_table
将其直接传递到StringIO()
:
import pandas as pd
import xml.etree.ElementTree as et
from io import StringIO
import requests as rq
# RETRIEVE URL OBJECT
r = rq.get('https://earthquake.usgs.gov/realtime/product/shakemap/us2000a3y4/us/1501736303313/download/grid.xml')
# BUILD TREE FROM URL CONTENT
doc = et.fromstring(r.content)
# PARSE <grid_data> TEXT WITH UNDECLARED PREFIX NAMESPACE
data = doc.find('.//{http://earthquake.usgs.gov/eqcenter/shakemap}grid_data').text
# READ SPACE-DELIMITED STRING INTO DATAFRAME
df = pd.read_table(StringIO(data), sep="\\s+", header=0,
names=['LON','LAT','PGA', 'PGV', 'MMI','PSA03','PSA10','PSA30','STDPGA','URAT','SVEL'])
print(df.head())
# LON LAT PGA PGV MMI PSA03 PSA10 PSA30 STDPGA URAT SVEL
# 0 -100.3997 38.1145 0.01 0.01 1.77 0.02 0.01 0.0 0.65 1.0 354.533
# 1 -100.3831 38.1145 0.01 0.02 1.82 0.02 0.01 0.0 0.65 1.0 310.786
# 2 -100.3664 38.1145 0.01 0.01 1.77 0.02 0.01 0.0 0.65 1.0 354.545
# 3 -100.3497 38.1145 0.01 0.01 1.76 0.02 0.01 0.0 0.65 1.0 362.307
# 4 -100.3331 38.1145 0.01 0.01 1.76 0.02 0.01 0.0 0.65 1.0 360.332
print(df.tail())
# LON LAT PGA PGV MMI PSA03 PSA10 PSA30 STDPGA URAT SVEL
# 105767 -94.4831 33.2425 0.01 0.01 1.78 0.02 0.01 0.0 0.65 1.0 337.237
# 105768 -94.4664 33.2425 0.01 0.02 1.89 0.03 0.01 0.0 0.65 1.0 249.221
# 105769 -94.4497 33.2425 0.01 0.02 1.83 0.02 0.01 0.0 0.65 1.0 297.622
# 105770 -94.4331 33.2425 0.01 0.01 1.63 0.02 0.01 0.0 0.65 1.0 500.368
# 105771 -94.4164 33.2425 0.01 0.01 1.77 0.02 0.01 0.0 0.65 1.0 340.302