检查数据框是否具有完整数据网格的有效方法

时间:2019-02-22 12:22:29

标签: python pandas dataframe

我想检查从数据框的左上角到最右下的元素的数据框中的所有数据是否完整(数据应填写为矩形)。如果数据主体后面有的空白列或行,则很好(它将具有此功能)。

好坏数据帧的示例如下:

bad_dataframe = pd.DataFrame([[1,1,1,""],["","","",""],[1,"",1,""],["","","",""]])
good_dataframe = pd.DataFrame([[1,1,1,""],[1,1,1,""],[1,1,1,""],[1,1,1,""],["","","",""]])

我完成的方式如下

def not_rectangle_data(DataFrame):
    """
    This function will check if the data given to it is a "rectangle"
    """

    #removes all rows and columns that contain only blanks
    reduced_dataframe = DataFrame[DataFrame != ""].dropna(how="all",axis = 1).dropna(how="all",axis = 0)

    #removes all rows and columns that contain any blanks
    super_reduced_dataframe = reduced_dataframe.dropna(how="any",axis = 1).dropna(how="any",axis = 0)

    #Check that dataframe is not empty and that no column or no rows are half empty
    if not reduced_dataframe.empty and \
            super_reduced_dataframe.equals(reduced_dataframe):        

        #Check that columns in remain data are still present
        if ((max(reduced_dataframe.index) + 1) == reduced_dataframe.shape[0]) and \
            ((max(reduced_dataframe.columns) + 1) == reduced_dataframe.shape[1]):
            return True
        else:
            return False
    else:
        return False

但是我觉得应该有一种更简洁的方法。

非常感谢

1 个答案:

答案 0 :(得分:0)

使用numpy

import numpy as np

def check_rectangle(df):
    non_zeros = np.nonzero(df.values)
    arr = np.zeros(np.max(non_zeros, 1)+1)
    np.add.at(arr, non_zeros, 1)
    return np.alltrue(arr)

check_rectangle(good_dataframe)
# True
check_rectangle(bad_dataframe)
# False
  • np.nonzero获取所有不为零的索引(''在此处被视为零)。
  • np.zeros(np.max(non_zeros, 1)+1)创建适合non_zeros的最小矩形。
  • np.add.at已将1添加到所有非零位置。
  • 最后,如果填充了矩形,np.alltrue返回True,否则返回False