导入空格将.csv分隔为python3,在开始时忽略文本?

时间:2017-12-15 05:49:03

标签: python python-3.x

我想将以下.csv数据(.txt文件)导入到每列数据的python列表中,忽略开头的文本。我无法更改文件的格式。我收到了错误:

import csv
import numpy as np

file = open('Data_Bris.txt')
reader = csv.reader(file, delimiter=' ')

datelist = []
rainlist = []
evaplist = []
for row in reader:
    # row = [date, day, date2, T.Max, Smx, T.Min, Smn, Rain, Srn, Evap, Sev, Rad, Ssl, VP, Svp, maxT, minT, Span, Ssp]
    date_column = str(row[0])
    rain_column = float(row[7])
    evap_column = float(row[9])

    datelist.append([date_column])
    rainlist.append([rain_column])
    evaplist.append([evap_column])

date = np.array([datelist])
rain = np.array([rainlist])
evap = np.array([evaplist])

timeseries = np.arange(rain.size)

这是我试图开始工作的代码......

"17701231" 365 31/12/1770 -99.9 999 -99.9 999 9999.9 999 999.9 999 999.9  999 999.9 999 9999.9 9999.9 9999.9  999
""
" This file is SPACE DELIMITED for easy import into both spreadsheets and programs."
"The first line 17701231 contains dummy data and is provided to allow spreadsheets to sense the columns"
" To read into a spreadsheet select DELIMITED and SPACE."
" "
" "
"=========  The following essential information and notes should be kept in the data file =========="
" "
"The Data Drill system and data are copyright to the Queensland Government Department of Science, Information Technology and Innovation (DSITI)."
"SILO data, with the exception of Patched Point data for Queensland, are supplied to the licencee only and may not be given, lent, or sold to any other party"
" "
"Notes:"
" * Data Drill for Lat, Long: -27.5000 153.0000 (DECIMAL DEGREES), 27 30'S 153 00'E Your Ref: Data_Bris"
" * Elevation:  102m "
" * Extracted from Silo on 20171214"
" * Please read the documentation on the Data Drill at http://www.longpaddock.qld.gov.au/silo"
" "
" * As evaporation is read at 9am, it has been shifted to the day before"
"    ie The evaporation measured on 20 April is in row for 19 April"
" * The 6 Source columns Smx, Smn, Srn, Sev, Ssl, Svp indicate the source of the data to their left, namely Max temp, Min temp, Rainfall, Evaporation, Radiation and Vapour Pressure respectively "
" "
"   35 = interpolated from daily observations using anomaly interpolation method for CLIMARC data
"   25 = interpolated daily observations,     75 = interpolated long term average"
"   26 = synthetic pan evaporation "
" "
" * Relative Humidity has been calculated using 9am VP, T.Max and T.Min"
"   RHmaxT is estimated Relative Humidity at Temperature T.Max"
"   RHminT is estimated Relative Humidity at Temperature T.Min"
"   Span = a calibrated estimate of class A pan evaporation based on vapour pressure deficit and solar radiation          
" * The accuracy of the data depends on many factors including date, location, and variable."
"   For consistency data is supplied using one decimal place, however it is not accurate to that precision."
"   Further information is available from http://www.longpaddock.qld.gov.au/silo"
"===================================================================================================="
" "
Date       Day Date2      T.Max Smx T.Min Smn Rain   Srn  Evap Sev Radn   Ssl VP    Svp RHmaxT RHminT Span   Ssp    
(yyyymmdd)  () (ddmmyyyy)  (oC)  ()  (oC)  ()   (mm)  ()  (mm)  () (MJ/m2) () (hPa)  ()   (%)    (%)    (mm)  () 
18890101     1  1-01-1889  29.5  35  21.5  35    0.3  25   6.2  75  23.0   35  26.0  35   63.1  100.0    5.6  26
18890102     2  2-01-1889  32.0  35  21.5  35    0.1  25   6.2  75  23.0   35  21.0  35   44.2   81.9    6.9  26
18890103     3  3-01-1889  31.5  35  21.5  35    0.0  25   6.2  75  23.0   35  24.0  35   51.9   93.6    6.4  26
18890104     4  4-01-1889  29.5  35  21.0  35    0.0  25   6.2  75  23.0   35  22.0  35   53.4   88.5    6.1  26
18890105     5  5-01-1889  30.0  35  19.0  35    0.0  25   6.2  75  23.0   35  19.0  35   44.8   86.5    6.5  26
18890106     6  6-01-1889  28.5  35  18.5  35    0.0  25   6.2  75  23.0   35  23.0  35   59.1  100.0    5.7  26
18890107     7  7-01-1889  30.0  35  18.5  35    0.1  25   6.2  75  23.0   35  20.0  35   47.1   94.0    6.4  26
18890108     8  8-01-1889  28.0  35  18.5  35    0.0  25   6.2  75  23.0   35  21.0  35   55.6   98.7    5.8  26
18890109     9  9-01-1889  28.5  35  19.0  35    0.0  25   6.2  75  24.0   35  22.0  35   56.5  100.0    6.0  26
18890110    10 10-01-1889  29.0  35  20.0  35    0.0  25   6.2  75  23.0   35  21.0  35   52.4   89.9    6.1  26

这是我要导入的数据文件(继续相同)......

android.security.keystore

1 个答案:

答案 0 :(得分:2)

在这里,您要忽略标题中的所有行,包括列的名称和格式。实现这一目标的一种简单方法是忽略任何不以数字开头的行。使用生成器(为了避免将所有文件加载到内存中),您只需创建reader

...
reader = csv.reader((row for row in io.StringIO(t) if row[0].isdigit()),
    delimiter=' ', skipinitialspace=True))
...

skipinitialspace=True允许将多个空格作为单个分隔符接受。