Python初学者:从CSV文件中提取特定的每一行,并将其写入不同的CSV文件

时间:2016-02-15 18:37:20

标签: python csv pandas row extraction

我有一个包含40行气象站数据的.csv文件,类似于:

Date        Station                  PET  Max Temp  Min Temp

2/11/2016   Conroe                   0.09   70       33
2/11/2016   Huntsville               0.11   69       33
2/11/2016   Overton                  0.14   67       34
2/11/2016   Allen                    0.11   71       32
2/11/2016   Dallas AgriLife Center   0.17   71       37
2/11/2016   Forney                   0.13   70       35

我正在尝试使用pandas从此文件中提取每个站的数据,并将其写入每个站的不同.csv文件。

我尝试过使用此代码:

import pandas as pd

df = pd.read_csv('C:\\Desktop\\report.csv')

for Station in df:
    df[Station].to_csv('C:\\data\\'+ Station +'.csv')

但是这段代码是按照这样的每一列提取数据,image of files created

请帮帮我... 有没有一种方法可以逐行迭代并提取数据,例如循环遍历每一行,并为每个工作站创建一个CSV文件。

2 个答案:

答案 0 :(得分:1)

df =pd.DataFrame({'Date': {0: '2/11/2016', 1: '2/11/2016', 2: '2/11/2016', 3: '2/11/2016', 4: '2/11/2016', 5: '2/11/2016'}, 'PET': {0: 0.089999999999999997, 1: 0.11, 2: 0.14000000000000001, 3: 0.11, 4: 0.17000000000000001, 5: 0.13}, 'Max Temp': {0: 70, 1: 69, 2: 67, 3: 71, 4: 71, 5: 70}, 'Station': {0: 'Conroe', 1: 'Huntsville', 2: 'Overton', 3: 'Allen', 4: 'Dallas Agri Life Center', 5: 'Forney'}, 'Min Temp': {0: 33, 1: 33, 2: 34, 3: 32, 4: 37, 5: 35}})

df.groupby('Station').apply(lambda x : pd.DataFrame.to_csv(x, x['Station'].values[0] + '.csv'))

答案 1 :(得分:1)

df[Station]只需选择列即可。你想做什么以下: 在伪代码中:

for each station in stations:
    select the row and put it a separate data_frame

when done write each data frame to a file.

这也不是很难在熊猫中实现的。方法如下:

 for name in df.Station:
   ....:     print df[df.Station == name]
   ....:     
        Date Station   PET  Max Temp  Min Temp
0  2/11/2016  Conroe  0.09        70        33
        Date     Station   PET  Max Temp  Min Temp
1  2/11/2016  Huntsville  0.11        69        33
        Date  Station   PET  Max Temp  Min Temp
2  2/11/2016  Overton  0.14        67        34
        Date Station   PET  Max Temp  Min Temp
3  2/11/2016   Allen  0.11        71        32
        Date                 Station   PET  Max Temp  Min Temp
4  2/11/2016  Dallas AgriLife Center  0.17        71        37
        Date Station   PET  Max Temp  Min Temp
5  2/11/2016  Forney  0.13        70        35

这只是一个打印,但你可以用写入新的csv替换打印:

In [54]: for name in df.Station:
   ....:     df[df.Station == name].to_csv(name+'.csv')
   ....:     

In [55]: ls
Allen.csv  Conroe.csv  Dallas AgriLife Center.csv  foo.csv  Forney.csv  Huntsville.csv  Overton.csv  stations.csv

现在每个文件都包含您想要的数据。