当每个切片具有不同的开始和结束列号时,使用列表对数据帧进行切片

时间:2019-06-20 18:57:39

标签: python pandas slice

我正在尝试汇总数据框中每一行的跨列值。当每行的起始索引和结束索引相同时,这不是问题,因为我可以使用df.iloc和sum(),如下所示。

但是,当每行的开始或结束索引不同时,我试图找到一种比循环遍历各个行更有效的方法。下面的示例显示了每行更改的结束索引和相同的开始索引。

任何指向更优化方法的指针将不胜感激。

谢谢!

import pandas as pd
import numpy as np

data_example = {'col1': [1,1,1,1], 'col2': [2,2,2,2], 'col3':[3,3,3,3], 'condition':[2,1,3,0]}
df = pd.DataFrame(data=data_example)

# getting column number using column name
col1_column_number = df.columns.get_loc("col1")
col2_column_number = df.columns.get_loc("col2")

# Summing across col1 and col2. The slices are same for every row in this case
df['sum col1 and col2'] = df.iloc[:,col1_column_number:col2_column_number+1].sum(axis=1)
df['sum condition'] = 0

# Attempting to use array to specify start and end indices of columns --> assuming it is different for each row
# The following, as expected, does not work
col_start_array = [col1_column_number, col1_column_number, col1_column_number, col1_column_number]
col_end_array = [x+1 for x in df['condition'].values.tolist()] #adding 1 to ensure col_end includes all columns in condition
#df.iloc[:,col_start_array:col_end_array].sum(axis=1)

# Looping over rows --> works but not efficient
sum_condition_column_number = df.columns.get_loc("sum condition")
for row in df.iterrows():
    df.iloc[row[0], sum_condition_column_number] = df.iloc[row[0],0:row[1]['condition']].sum()

# Tried to follow the example on SO using slices, but error on slice objct, I suspect this is not the preferred approach?: 
# https://stackoverflow.com/questions/52563916/get-rows-from-a-dataframe-by-using-a-list-of-slices 
#slices = []
#for end_col in df['condition'].values.tolist():
#    slices.append(slice(0,end_col+1))

#for slice_obj in slices:
#    df['sum condition'] = df.iloc[:,slice_obj].sum(axis=1)

# Any other way that is more efficient than looping?

0 个答案:

没有答案