我想检查从数据框的左上角到最右下的元素的数据框中的所有数据是否完整(数据应填写为矩形)。如果数据主体后面有的空白列或行,则很好(它将具有此功能)。
好坏数据帧的示例如下:
bad_dataframe = pd.DataFrame([[1,1,1,""],["","","",""],[1,"",1,""],["","","",""]])
good_dataframe = pd.DataFrame([[1,1,1,""],[1,1,1,""],[1,1,1,""],[1,1,1,""],["","","",""]])
我完成的方式如下
def not_rectangle_data(DataFrame):
"""
This function will check if the data given to it is a "rectangle"
"""
#removes all rows and columns that contain only blanks
reduced_dataframe = DataFrame[DataFrame != ""].dropna(how="all",axis = 1).dropna(how="all",axis = 0)
#removes all rows and columns that contain any blanks
super_reduced_dataframe = reduced_dataframe.dropna(how="any",axis = 1).dropna(how="any",axis = 0)
#Check that dataframe is not empty and that no column or no rows are half empty
if not reduced_dataframe.empty and \
super_reduced_dataframe.equals(reduced_dataframe):
#Check that columns in remain data are still present
if ((max(reduced_dataframe.index) + 1) == reduced_dataframe.shape[0]) and \
((max(reduced_dataframe.columns) + 1) == reduced_dataframe.shape[1]):
return True
else:
return False
else:
return False
但是我觉得应该有一种更简洁的方法。
非常感谢
答案 0 :(得分:0)
使用numpy
:
import numpy as np
def check_rectangle(df):
non_zeros = np.nonzero(df.values)
arr = np.zeros(np.max(non_zeros, 1)+1)
np.add.at(arr, non_zeros, 1)
return np.alltrue(arr)
check_rectangle(good_dataframe)
# True
check_rectangle(bad_dataframe)
# False
np.nonzero
获取所有不为零的索引(''
在此处被视为零)。np.zeros(np.max(non_zeros, 1)+1)
创建适合non_zeros
的最小矩形。np.add.at
已将1
添加到所有非零位置。np.alltrue
返回True
,否则返回False
。