我的流程图如上所示,我想从原始数据集中取两行,然后将它们导入另一行(因为我不想修改原始数据)。在新数据集中,检查2行是否具有相同数量的非NaN值(df.iloc[i,:].count()
),否则,将数字差异填入零,然后继续执行操作。
实施例: 原始数据:
3 5 5 NaN NaN NaN
1 4 7 5 NaN NaN
NaN NaN 3 6 7 NaN
NaN 3 8 4 11 NaN
3 0 3 7 2 1
取2行i和i + 1并将它们导入另一个数据集:
3 5 5 NaN NaN NaN
1 4 7 5 NaN NaN
因为df.iloc[i+1,:].count() != df.iloc[i,:].count()
,所以必须像这样填充具有更多NaN值的行:
3 5 5 0 NaN NaN
1 4 7 5 NaN NaN
如果是第3行和第4行
NaN 0 3 6 7 NaN
NaN 3 8 4 11 NaN
然后执行操作。
这是我的代码:
for i in range():
process[1,:] = df.iloc[i,:]
process[2,:] = df.iloc[i+1,:]
while True:
if process[1,:].count() == process[2,:].count():
break
else:
if process[1,:].count() > process[2,:].count():
process[2,:] = process[2,:].fillna(value = 0, limit = process[1,:].count() - process[2,:].count())
else:
process[2,:] = process[2,:].fillna(value = 0, limit = process[2,:].count() - process[1,:].count())
A[i,:] = stats.ttest_rel(process[1,:].values, process[2,:].values) #this line is just for the statistical test, you can ignore it
i += 1
我的算法不起作用,我觉得通过一遍又一遍地检查行和行,它们在某种程度上太笨拙。
欢迎任何建议和更正,非常感谢。
P / s:我想连续对每一行进行统计测试,所以在这之前,我需要让它们具有相同数量的非NaN值。
答案 0 :(得分:0)
最后,我可以提出答案,我想在此分享:
for i in range(5):
process = pd.DataFrame(columns=df.columns)
process = process.append(df.iloc[i,:], ignore_index = True)
process = process.append(df.iloc[i+1,:], ignore_index = True)
while True:
if process.iloc[0,:].count() == process.iloc[1,:].count():
break
else:
if process.iloc[0,:].count() > process.iloc[1,:].count():
process.iloc[1,:] = process.iloc[1,:].fillna(value = 0, limit = process.iloc[0,:].count() - process.iloc[1,:].count())
else:
process.iloc[0,:] = process.iloc[0,:].fillna(value = 0, limit = process.iloc[1,:].count() - process.iloc[0,:].count())
i += 1