我想使用NOAA网站上的一些数据。它是一个csv文件,包含自1851年以来所有飓风的数据,格式如下:Format example / README file
正如您所看到的,虽然所有内容都包含在一个csv文件中,但每个飓风都有自己的表,并有一个单独的标题。
如何删除标题并将信息放入" Hurricane Name"列而不是?我想将所有内容组合到一个数据框中,因此更容易使用。谢谢!
示例:
AL092011,IRENE,3,
20110821,0000 ,, TS,15.0N,59.0W,45,1006,105,0,0,
45,0,0,0,0,0,0,0,0,20110821,0600,,TS,16.0N,60.6W,45,1006,130,0,0,
80,0,0,0,0,0,0,0,0,20110821,1200 ,, TS,16.8N,62.2W,45,1005,130,0,0,
70,0,0,0,0,0,0,0,0,AL092012,ANOTHER_NAME,2,
20110821,1800,,TS,17.5N,63.7W,50,999,130,20,0,
70,30,0,0,0,0,0,0,0,20110822,0000 ,, TS,17.9N,65.0W,60,993,130,30,30,
90,30,0,0,30,0,0,0,0,
我希望将标题信息放入列中,如下所示:
AL092011,IRENE,20110821,0000 ,, TS,15.0N,59.0W,45,1006,105,0,0,
45,0,0,0,0,0,0,0,0,AL092011,IRENE,20110821,0600,TS,16.0N,60.6W,45,1006,130,0,0,
80,0,0,0,0,0,0,0,0,AL092011,IRENE,20110821,1200,TS,16.8N,62.2W,45,1005,130,0,0,
70,0,0,0,0,0,0,0,0,AL092012,ANOTHER_NAME,20110821,1800,,TS,17.5N,63.7W,50,999,130,20,0,
70,30,0,0,0,0,0,0,0,AL092012,ANOTHER_NAME,20110822,0000 ,, TS,17.9N,65.0W,60,993,130,30,30,
90,30,0,0,30,0,0,0,0,
答案 0 :(得分:0)
这就是我想出来的。可能不是最快的方法(有兴趣知道它是什么,请!)但它完成了这项工作。我将csv拆分为每个飓风的不同文件。然后我逐个加载这些文件,将标题重新格式化为附加列,并将所有文件连接成一个数据框。如果我能以更有效的方式做到这一点,请告诉我:)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob
# split hurricanes into separate files
partNum = 1
outHandle = None
for line in open("data/atlantic_1851_2017_2.csv","r").readlines():
if line.startswith('AL'):
if outHandle is not None:
outHandle.close()
outHandle = open("data/part%d.csv" % (partNum,), "w")
partNum += 1
outHandle.write(line)
outHandle.close()
# read in each file as data-frame
files = glob.glob('data/part*.csv')
frames = []
for csv in files:
with open(csv) as f:
first_line = f.readline()
first_line = first_line.split(',')
df = pd.read_csv(csv, skiprows=[0], header=None)
df['ID'] = first_line[0]
df['Name'] = first_line[1]
frames.append(df)
# concatenate into a single data-frame
df = pd.concat(frames)
df = df.drop(columns=[8,9,10,11,12,13,14,15,16,17,18,19,20])
df.columns = ['Date','Time','Record_ID','Strength','Lat','Long','Max_Wind_Knots','Max_Pressure_mb','ID','Name']
print(df.head(5))