我正在使用熊猫,因此我需要从此选择仅包含工作日和跳过周末的数据的列。
Employee Thu 02-08 Fri 02-08 Sat 02-09 Sun 02-10 Mon 02-11 Tue 02-12
Daniel,s | 7.65 | 0.00 |0.00 |0.00 |8.45 |8.20
Doucore,d| 5.21 | 8.20 |5.00 |0.00 |8.10 |9.22
Jimene,c | 6.55 | 9.30 |0.00 |0.00 |9.20 |2.00
对此:
Employee Thu 02-08 Fri 02-08 Mon 02-11 Tue 02-12
Daniel,s | 7.65 | 0.00 |8.45 |8.20
Doucore,d| 5.21 | 8.20 |8.10 |9.22
Jimene,c | 6.55 | 9.30 |9.20 |2.00
我需要以任意顺序动态删除周末(星期六和星期日)的列。 任何帮助都受到高度赞赏 我的基本代码就是这样
def analize_data(self):
def check_for_absent_patter(data):
''' this will only will check for the last 3 days if there are absent '''
return True if data[-1]== 0 and data[-2] == 0 and data[-3] == 0 else False
filtered_data = self.raw_data.drop(['Unnamed: 0', 'Employee ID', 'Title', 'Total Hours', 'Hourly Rate', 'Total Pay'], axis=1)
### drop columns around here maybe....
ready_to_analisis = filtered_data.groupby('Employee').sum()
ready_to_analisis['long_Absent'] = ready_to_analisis.apply(check_for_absent_patter, axis=1)
print(ready_to_analisis[ready_to_analisis['long_Absent']].to_string())
我知道在过滤后的数据首次显示后我必须删除列。 谢谢。
答案 0 :(得分:2)
用startswith
和boolean indexing
过滤不以元组字符串开头的列:
df = df.loc[:, ~df.columns.str.startswith(('Sat','Sun'))]
答案 1 :(得分:1)
df =pd.DataFrame(columns= ["Thu 02-08", "Fri 02-08", "Sat 02-09", "Sun 02-10", "Mon 02-11" ,"Tue 02-12"],
data = np.random.rand(3,6))
# this is how you would select columns that dont contain Sat or Sun
df = df[[x for x in df.columns if ('Sat' not in x) & ('Sun' not in x)]]