如何使用额外信息作为数据框读取和写入表,并从信息中添加新列

时间:2016-06-01 13:00:30

标签: python pandas dataframe

我有一个从StringIO生成的类似文件的对象,它是一个在表格前面有信息行的表(参见下面从#TIMESTAMP开始)。

我想使用以下信息添加额外的列"日期"," UTCoffset - 时间(替代)"来自#Timestamp和" ZenAngle"来自#GLOBAL_SUMMARY。

我使用pd.read_csv命令来读取它,但只有当我跳过包含我需要的信息的前8行时它才有用。 Error" TypeError:data参数也不能成为迭代器"据报道,我试图将下面的对象导入为数据帧。

#TIMESTAMP
UTCOffset,Date,Time
+00:30:32,2011-09-05,08:32:21

#GLOBAL_SUMMARY
Time,IntACGIH,IntCIE,ZenAngle,MuValue,AzimAngle,Flag,TempC,O3,Err_O3,SO2,Err_SO2,F324
08:32:21,7.3576,52.758,59.109,1.929,114.427,000000,24,291,1,,,91.9

#GLOBAL
Wavelength,S-Irradiance,Time
290.0,0.000e+00
290.5,0.000e+00
291.0,4.380e-06
291.5,2.234e-05
292.0,2.102e-05
292.5,2.204e-05
293.0,2.453e-05
293.5,2.256e-05
294.0,3.088e-05
294.5,4.676e-05
295.0,3.384e-05
295.5,3.582e-05
296.0,4.298e-05
296.5,3.774e-05
297.0,4.779e-05
297.5,7.399e-05
298.0,9.214e-05
298.5,1.080e-04
299.0,2.143e-04
299.5,3.180e-04
300.0,3.337e-04
300.5,4.990e-04
301.0,8.688e-04
301.5,1.210e-03
302.0,1.133e-03

1 个答案:

答案 0 :(得分:0)

我认为您可以先使用read_csv创建3 DataFrames

import pandas as pd
import io

temp=u"""#TIMESTAMP
UTCOffset,Date,Time
+00:30:32,2011-09-05,08:32:21

#GLOBAL_SUMMARY
Time,IntACGIH,IntCIE,ZenAngle,MuValue,AzimAngle,Flag,TempC,O3,Err_O3,SO2,Err_SO2,F324
08:32:21,7.3576,52.758,59.109,1.929,114.427,000000,24,291,1,,,91.9

#GLOBAL
Wavelength,S-Irradiance,Time
290.0,0.000e+00
290.5,0.000e+00
291.0,4.380e-06
291.5,2.234e-05
292.0,2.102e-05
292.5,2.204e-05
293.0,2.453e-05
293.5,2.256e-05
294.0,3.088e-05
294.5,4.676e-05
295.0,3.384e-05
295.5,3.582e-05
296.0,4.298e-05
296.5,3.774e-05
297.0,4.779e-05
297.5,7.399e-05
298.0,9.214e-05
298.5,1.080e-04
299.0,2.143e-04
299.5,3.180e-04
300.0,3.337e-04
300.5,4.990e-04
301.0,8.688e-04
301.5,1.210e-03
302.0,1.133e-03
"""
df1 = pd.read_csv(io.StringIO(temp),
                 skiprows=9)

print (df1)
    Wavelength  S-Irradiance  Time
0        290.0      0.000000   NaN
1        290.5      0.000000   NaN
2        291.0      0.000004   NaN
3        291.5      0.000022   NaN
4        292.0      0.000021   NaN
5        292.5      0.000022   NaN
6        293.0      0.000025   NaN
7        293.5      0.000023   NaN
8        294.0      0.000031   NaN
9        294.5      0.000047   NaN
10       295.0      0.000034   NaN
11       295.5      0.000036   NaN
12       296.0      0.000043   NaN
13       296.5      0.000038   NaN
14       297.0      0.000048   NaN
15       297.5      0.000074   NaN
16       298.0      0.000092   NaN
17       298.5      0.000108   NaN
18       299.0      0.000214   NaN
19       299.5      0.000318   NaN
20       300.0      0.000334   NaN
21       300.5      0.000499   NaN
22       301.0      0.000869   NaN
23       301.5      0.001210   NaN
24       302.0      0.001133   NaN
df2 = pd.read_csv(io.StringIO(temp),
                  skiprows=1,
                  nrows=1)

print (df2)
   UTCOffset        Date      Time
0  +00:30:32  2011-09-05  08:32:21

df3 = pd.read_csv(io.StringIO(temp),
                  skiprows=5,
                  nrows=1)

print (df3)
       Time  IntACGIH  IntCIE  ZenAngle  MuValue  AzimAngle  Flag  TempC   O3  \
0  08:32:21    7.3576  52.758    59.109    1.929    114.427     0     24  291   

   Err_O3  SO2  Err_SO2  F324  
0       1  NaN      NaN  91.9