我有2个数据帧(Thresholds和InfoTable),第一行是一行标题:
(Thresholds)
AA BB CC DD EE
0 15 7 0 23
和
(InfoTable)
ID Xposition Yposition AA BB CC DD EE
1 1 1 10 20 5 10 50
2 2 2 20 12 10 20 2
3 3 3 30 19 17 30 26
4 4 4 40 35 3 40 38
5 5 5 50 16 5 50 16
我正在尝试过滤数据,以便Thresholds数据框中包含0的列是从InfoTable数据框中删除的列。然后,我尝试将Thresholds数据帧中每行的值与InfoTable数据帧中的值进行比较,以便可以在Infotable中将它们替换为1或0。我想要的输出如下:
ID Xposition Yposition BB CC EE
1 1 1 1 0 1
2 2 2 0 1 0
3 3 3 1 1 1
4 4 4 1 0 1
5 5 5 1 0 0
这是我现在用来过滤每个表的代码。
with open('thresholds_test.txt' ) as a:
Thresholds = pd.read_table(a, sep=',')
print Thresholds
with open('includedThresholds.txt') as b:
IncludedThresholds = pd.read_table(b, sep=',' )
print IncludedThresholds
InterestingThresholds = IncludedThresholds.drop(IncludedThresholds.columns[~IncludedThresholds.iloc[0].astype(bool)],axis=1)
print InterestingThresholds
with open('PivotTable.tab') as c:
PivotTable = pd.read_table(c, sep='\t' )
print PivotTable
headers = InterestingThresholds.columns.append(pd.Index(['ID','XPostion','YPosition']))
InfoTable = PivotTable.loc[:, headers]
print InfoTable
任何帮助将不胜感激!
答案 0 :(得分:1)
查找要保留和删除的列:
cols = Thresholds.columns[Thresholds.iloc[0].astype(bool)]
dcols = Thresholds.columns[~Thresholds.iloc[0].astype(bool)]
做比较:
comp_df = pd.DataFrame(InfoTable[cols].values >= Thresholds[cols].values, columns=cols).astype(int)
将比较结果分配给原始数据框和删除列:
df_out = InfoTable.assign(**comp_df).drop(dcols, axis=1)
print(df_out)
输出:
ID Xposition Yposition BB CC EE
0 1 1 1 1 0 1
1 2 2 2 0 1 0
2 3 3 3 1 1 1
3 4 4 4 1 0 1
4 5 5 5 1 0 0