Question

我正在使用熊猫，因此我需要从此选择仅包含工作日和跳过周末的数据的列。

Employee  Thu 02-08 Fri 02-08 Sat 02-09 Sun 02-10 Mon 02-11 Tue 02-12
Daniel,s | 7.65    | 0.00    |0.00     |0.00     |8.45     |8.20
Doucore,d| 5.21    | 8.20    |5.00     |0.00     |8.10     |9.22
Jimene,c | 6.55    | 9.30    |0.00     |0.00     |9.20     |2.00

对此：

Employee  Thu 02-08 Fri 02-08 Mon 02-11 Tue 02-12    
Daniel,s | 7.65    | 0.00    |8.45     |8.20
Doucore,d| 5.21    | 8.20    |8.10     |9.22
Jimene,c | 6.55    | 9.30    |9.20     |2.00

我需要以任意顺序动态删除周末（星期六和星期日）的列。任何帮助都受到高度赞赏我的基本代码就是这样

def analize_data(self):

    def check_for_absent_patter(data):
        ''' this will only will check for the last 3 days if there are absent '''
        return True if data[-1]== 0 and data[-2] == 0 and data[-3] == 0 else False

    filtered_data = self.raw_data.drop(['Unnamed: 0', 'Employee ID', 'Title', 'Total Hours', 'Hourly Rate', 'Total Pay'], axis=1)

    ### drop columns around here maybe....

    ready_to_analisis = filtered_data.groupby('Employee').sum()
    ready_to_analisis['long_Absent'] = ready_to_analisis.apply(check_for_absent_patter, axis=1)
    print(ready_to_analisis[ready_to_analisis['long_Absent']].to_string())

我知道在过滤后的数据首次显示后我必须删除列。谢谢。

Answer 1

用startswith和boolean indexing过滤不以元组字符串开头的列：

df = df.loc[:, ~df.columns.str.startswith(('Sat','Sun'))]

Answer 2

df =pd.DataFrame(columns= ["Thu 02-08", "Fri 02-08", "Sat 02-09", "Sun 02-10", "Mon 02-11" ,"Tue 02-12"],
                data = np.random.rand(3,6))

# this is how you would select columns that dont contain Sat or Sun
df = df[[x for x in df.columns if ('Sat' not in x) & ('Sun' not in x)]]

下拉列是周末。选择仅是星期几的索引列

2 个答案: