使用.ix和.isin循环遍历文件

时间:2016-04-11 21:50:33

标签: python-2.7 loops pandas

我的原始数据如下:

SUBBASIN HRU HRU_SLP    OV_N
1         1 0.016155144 0.15
1         2 0.015563287 0.14
2         1 0.010589782 0.15
2         2 0.011574839 0.14
3         1 0.013865396 0.15
3         2 0.01744597  0.15
3         3 0.018983217 0.14
3         4 0.013890315 0.05
3         5 0.011792533 0.05

我需要为每个SUBBASIN编号修改OV_N的值:

hru = pd.read_csv('hru.csv')
for i in hru.OV_N:
    hru.ix[hru.SUBBASIN.isin([76,65,64,72,81,84,60,46,37,1,2]), 'OV_N'] = i*(1+df21.value[12])
    hru.ix[hru.SUBBASIN.isin([80,74,75,66,55,53,57,63,61,41,38,27,26,45,40,34,35,31,33,21,20,17,18,19,23,14,13,8,7,11,6,4,3,5,12]), 'OV_N'] = i*(1+df23.value[12])
    hru.ix[hru.SUBBASIN.isin([85,58,78,54,59,51,52,30,28,16,15,77,79,71,70,86,73,68,69,56,67,62,82,87,83,91,89,90,43,36,39,47,32,49,42,48,50,49,29,22,24,25,9,10]), 'OV_N'] = i*(1+df56.value[12])
    hru.ix[hru.SUBBASIN.isin([92,88,95,94,93]), 'OV_N'] = i*(1+df58.value[12])

其中df21.value [12]是来自txt文件的值 代码导致所有子流的OV_N的无限值,所以我假设循环遍历文件多次,但我找不到错误,这个代码在使用不同数量的子流之前就已经工作了。

1 个答案:

答案 0 :(得分:1)

通常最好不要在pandas DataFrame中循环和索引行。通过列操作转换DataFrame是一种更为丑陋的方法。一个pandas DataFrame可以被认为是pandas的压缩组合系列:每列都是它自己的pandas Series - 所有共享相同的索引。可以将操作应用于一个或多个pandas Series以创建共享相同索引的新Series。还可以应用操作将Series与一维numpy数组合并以创建新系列。理解pandas indexing很有帮助 - 但是这个答案只会使用顺序整数索引。

修改每个SUBBASIN编号的OV_N值:
通过从原始问题中的hru.csv读取来初始化hru DataFrame。在这里,我们用问题中给出的数据初始化它。

import numpy as np
import pandas as pd

hru = pd.DataFrame({
    'SUBBASIN':[1,1,2,2,3,3,3,3,3],
    'HRU':[1,2,1,2,1,2,3,4,5],
    'HRU_SLP':[0.016155144,0.015563287,0.010589782,0.011574839,0.013865396,0.01744597,0.018983217,0.013890315,0.011792533],
    'OV_N':[0.15,0.14,0.15,0.14,0.15,0.15,0.14,0.05,0.05]})

创建一个单独的pandas Series,它将各种DataFrame中的所有值(即df21,df23,df56和df58)收集并存储到一个位置。这将用于按索引查找值。我们称之为subbasin_multiplier_ds。让我们分别假设从txt文件中读取21,23,56和58的值。请使用从txt文件中读入的实际值替换它们。

subbasin_multiplier_ds=pd.Series([21]*96)
subbasin_multiplier_ds[80,74,75,66,55,53,57,63,61,41,38,27,26,45,40,
    34,35,31,33,21,20,17,18,19,23,14,13,8,7,11,6,4,3,5,12] = 23
subbasin_multiplier_ds[85,58,78,54,59,51,52,30,28,16,15,77,79,71,70,
    86,73,68,69,56,67,62,82,87,83,91,89,90,43,36,39,47,32,49,42,48,50,
    49,29,22,24,25,9,10] = 56
subbasin_multiplier_ds[92,88,95,94,93] = 58

根据DataFrame中的列替换hru DataFrame中的OV_N,并按索引在subbasin_multiplier_ds中查找。

hru['OV_N'] =  hru['OV_N'] * (1 + subbasin_multiplier_ds[hru['SUBBASIN']].values)

numpy数组由上面的.values创建,因此可以获得预期的结果。如果您想尝试删除值,请尝试查看会发生什么。