set column = pandas数据帧结构中其他列值的某些操作

时间:2015-02-21 00:44:03

标签: pandas dataframe slice

我正在尝试使用下面的逻辑执行pandas数据帧的简单操作。感兴趣的列中的值是小数(最多1个小数点)。操作的值不能为负,所以如果我想要0而不是。我已经尝试了两种方法来实现这一点但两种方法都会导致相同的错误。

方法ONE:

def compute_size(frame):
    for x in list(reversed(range(14, len(frame.columns),2))):
        tmp_value = frame.iloc[:,x] - frame.iloc[:,x-2]
        if tmp_value < 0:
            frame.iloc[:,x] = 0
        else:
            frame.iloc[:,x] = tmp_value 

方法二:

def compute_size(frame):
    for x in list(reversed(range(14, len(frame.columns),2))):
        frame.iloc[:,x] = max(0,frame.iloc[:,x] - frame.iloc[:,x-2])

当我调用上面的函数时,我收到以下错误:

C:\Python27\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
    690         raise ValueError("The truth value of a {0} is ambiguous. "
    691                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 692                          .format(self.__class__.__name__))
    693 
    694     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

更新1:

以下是一些示例数据:

df = pd.DataFrame({
'BlahBlah0' : ['','','',''],
'BlahBlah1' : ['','','',''],
'BlahBlah2' : ['','','',''],
'BlahBlah3' : ['','','',''],
'BlahBlah4' : ['','','',''],
'BlahBlah5' : ['A','C','E','G'],
'BlahBlah6' : ['B','D','F','H'],
'BlahBlah7' : ['','','',''],
'BlahBlah8' : ['','','',''],
'BlahBlah9' : ['','','',''],
'BlahBlah10' : ['','','',''],
'BlahBlah11' : ['','','',''],
'Size1':[1,1,1,1], 
'Price1':[50,50,50,50],
'Size2':[3,3,3,3], 
'Price2':[75,75,75,75], 
'Size3':[7,7,7,7], 
'Price3':[100,100,100,100], 
'Size4':[15,15,15,15], 
'Price4':[125,125,125,125],
'Size5':[25,25,25,25], 
'Price5':[200,200,200,200],
'Size6':[30,30,30,30], 
'Price6':[250,250,250,250],
'Size7':[40,40,40,40], 
'Price7':[300,300,300,300]
},columns=['BlahBlah0', 
'BlahBlah1',
'BlahBlah2',
'BlahBlah3',
'BlahBlah4',
'BlahBlah5',
'BlahBlah6',
'BlahBlah7',
'BlahBlah8',
'BlahBlah9',
'BlahBlah10',
'BlahBlah11',
'Size1',
'Price1',
'Size2',
'Price2',
'Size3',
'Price3',
'Size4',
'Price4',
'Size5',
'Price5',
'Size6',
'Price6',
'Size7',
'Price7'] )

现在,一旦你将上面的数据框插入到python中,列的排序就会失败。出于某种原因,pandas将Price列和Size列组合在一起。这不是意图。数据框应该与我显示的完全一致。我不确定你怎么能把它操纵回上面显示的方式。

假设您能够生成如上所述的确切数据帧,现在我想执行以下操作:

Size1 = Size1
Size2 = Max(0,Size2 - Size1)
Size3 = Max(0,Size3 - Size2)
Size4 = Max(0,Size4 - Size3)
Size5 = Max(0,Size5 - Size4)
Size6 = Max(0,Size6 - Size5)
Size7 = Max(0,Size7 - Size6)

因此,逻辑并不总是从列x中减去列x-2,而是仅对从列索引14到最后一列的每个其他列执行操作。

更新2:

我修复了有关数据帧排序的部分(见上文)。

基于上述逻辑的所需输出是以下数据帧:

df = pd.DataFrame({
'BlahBlah0' : ['','','',''],
'BlahBlah1' : ['','','',''],
'BlahBlah2' : ['','','',''],
'BlahBlah3' : ['','','',''],
'BlahBlah4' : ['','','',''],
'BlahBlah5' : ['A','C','E','G'],
'BlahBlah6' : ['B','D','F','H'],
'BlahBlah7' : ['','','',''],
'BlahBlah8' : ['','','',''],
'BlahBlah9' : ['','','',''],
'BlahBlah10' : ['','','',''],
'BlahBlah11' : ['','','',''],
'Size1':[1,1,1,1], 
'Price1':[50,50,50,50],
'Size2':[2,2,2,2], 
'Price2':[75,75,75,75], 
'Size3':[4,4,4,4], 
'Price3':[100,100,100,100], 
'Size4':[8,8,8,8], 
'Price4':[125,125,125,125],
'Size5':[10,10,10,10], 
'Price5':[200,200,200,200],
'Size6':[5,5,5,5], 
'Price6':[250,250,250,250],
'Size7':[10,10,10,10], 
'Price7':[300,300,300,300]
},columns=['BlahBlah0', 
'BlahBlah1',
'BlahBlah2',
'BlahBlah3',
'BlahBlah4',
'BlahBlah5',
'BlahBlah6',
'BlahBlah7',
'BlahBlah8',
'BlahBlah9',
'BlahBlah10',
'BlahBlah11',
'Size1',
'Price1',
'Size2',
'Price2',
'Size3',
'Price3',
'Size4',
'Price4',
'Size5',
'Price5',
'Size6',
'Price6',
'Size7',
'Price7'] )

我正在计算当前大小和之前大小之间的差值,这实际上是以新价格捕获净额外大小。

1 个答案:

答案 0 :(得分:1)

试一试。它沿轴1执行apply()并使用列表推导来处理减法。

cols_to_update = ['Size2','Size3','Size4','Size5','Size6','Size7']
cols_to_subtract = ['Size1','Size2','Size3','Size4','Size5','Size6','Size7']

df[cols_to_update] = df[cols_to_subtract].apply(
          lambda x : pd.Series([max(x[i] - x[i-1],0) for i in range(1,len(x))]),axis=1)