在多个范围之间过滤多个列

时间:2019-06-29 17:36:41

标签: python-3.x pandas

我有一个很大的数据,大部分包含数值。我想过滤每个列在不同范围之间的多个列。问题是列和范围将由用户选择,这意味着过滤的列和范围可以每次更改。

例如0<df[a]<5 & 0<df[b]<10。也可以是“ a”,“ b”和“ c”,完全取决于输入。

例如,我想查看一个范围内有多少行;每列; col.a在“ 0”和“ 1”之间,“ 1”和“ 2”之间,以此类推,直到5,而col.b或其他任何字符都在例如“ 10”之间

由于我的代码很长,请尝试解释字符串中的附加部分:

# -*- coding: utf-8 -*-
"""
excel_file: readed excel file dataframe
entered_parameters: (list) to be filtered columns typed by user
parameters: readed columns of excel_file
limits: (list) upper_limits inputted by user for each entered_parameters
ranges: range or incrementation list for each entered parameters
boolean_frame: Boolean dataframe returned for filtering each entered_parameters(columns) upto limits in each cycle
total_boolean_frame:appended boolean_frame(shows ranges up to limits for each parameter)
total_frame: concat of total_boolean_frame (shows all filtered boolean values by range for all param)

"""


total_frame=pd.DataFrame()
parameters=[i for i in excel_file.columns if type(i)==str]

totalrownumberlist=[]
for i,v in enumerate(limits):
    if i==0:
        totalrownumberlist.append(len(excel_file)*v)
    else:
        totalrownumberlist.append(totalrownumberlist[i-1]*v)
totalrownumber=totalrownumberlist[-1]
for i,param in enumerate(entered_parameters):

    total_boolean_frame=pd.DataFrame()
    appended_row_num=totalrownumberlist[i]
    if param in parameters:
        while appended_row_num<=totalrownumber:

            boolean_frame=pd.DataFrame()
            initial=0


            while initial<limits[i]:                          

                boolean_frame[param]=(excel_file[param]>=initial) & (excel_file[param]<=initial+ranges[i])

                boolean_frame["aralik-%s"%param]="%s-%s"%(initial,initial+ranges[i])


                initial=initial+ranges[i]

                total_boolean_frame=total_boolean_frame.append(boolean_frame,sort=False,ignore_index=True)

            appended_row_num=appended_row_num+totalrownumberlist[i]
        total_frame=pd.concat([total_frame,total_boolean_frame],axis=1)`

编辑:输出应如下所示; count(range [0-1] col.a和range [0-1] col.b)= 2(如果行中的所有单元格均为True,则平均通过轴= 1,这意味着excel_file [total_frame.all (axis = 1)]。count(range [1-2] col.a和range [0-1] col.b)= 3,再加上平均值,count(range [2-3] col.a和range [ 0-1] col.b)= 6 and avg。继续... 谢谢

0 个答案:

没有答案