计算以下空值的数量并将其放置在新的df中

时间:2018-07-10 23:48:25

标签: python python-3.x loops for-loop dataframe

df

我正在尝试计算数据帧中每个非空单元格下面的空值数量,并将其放入新的变量(大小)和数据帧中。

我已经包含了我要计算的数据帧的图片。我现在只对“到达日期”列感兴趣。新数据框的第一个观察结果应该是包含1,1、3、7..etc的列。

##Loops through all of rows in DOAs
for i in range(0, DOAs.shape[0]):
    j=0
    if DOAs.iloc[int(i),3] != None: ### the rest only runs if the current, i, observation isn't null
        newDOAs.iloc[int(j),0] = DOAs.iloc[int(i),3] ## sets the jth i in the new dataframe to the ith (currently assessed) row of the old
        foundNull = True #Sets foundNull equal to true
        k=1 ## sets the counter of people 
        while foundNull == True and (k+i) < 677: 
                if DOAs.iloc[int(i+k),3] == None: ### if the next one it looks at is null, increment the counter to add another person to the family
                    k = k+1
                else:
                    newDOAs.iloc[int(j),1] = k ## sets second column in new dataframe equal to the size
                    j = j+1
                    foundNull = False
    j=0

1 个答案:

答案 0 :(得分:0)

您可以做的是获取数据帧中任何列中非空条目的索引,然后获取它们之间的距离。注意:这是假设它们排列良好,和/或您不介意在数据帧上调用.reset_index()

以下是示例:

df = pd.DataFrame({'a': [1, None, None, None, 2, None, None, 3, None, None]})
not_null_index = df.dropna(subset=['a']).index
null_counts = {}

for i in range(len(not_null_index)):
    if i < len(not_null_index) - 1:
        null_counts[not_null_index[i]] = not_null_index[i + 1] - 1 - not_null_index[i]
    else:
        null_counts[not_null_index[i]] = len(df.a) - 1 - not_null_index[i]

null_counts_df = pd.DataFrame({'nulls': list(null_counts.values())}, index=null_counts.keys())
df_with_null_counts = pd.merge(df, null_counts_df, left_index=True, right_index=True)

基本上,这些代码所做的全部工作是获取数据帧中非空值的索引,然后获取每个索引与下一个非空索引之间的差,并将其放入列中。然后将那些null_counts粘贴到数据框中,并将其与原始数据合并。

运行此代码段后,df_with_null_counts等于:

     a  nulls
0  1.0      3
4  2.0      2
7  3.0      2

或者,您可以使用numpy而不是使用循环,这对于大型数据帧而言会更快。这是一个示例:

df = pd.DataFrame({'a': [1, None, None, None, 2, None, None, 3, None, None]})
not_null_index = df.dropna(subset=['a']).index

offset_index = np.array([*not_null_index[1:], len(df.a)])
null_counts = offset_index - np.array(not_null_index) - 1

null_counts_df = pd.DataFrame({'nulls': null_counts}, index=not_null_index)
df_with_null_counts = pd.merge(df, null_counts_df, left_index=True, right_index=True)

输出将是相同的。