我有一个看起来像这样的spark数据框
>>> cpCmd = ["cp", blogdir + "/*.log", thisdir+"/."]
>>> cpCmd
['cp', '/Volumes/Data/abcd/boards/*.log', './.']
>>> cProc = subprocess.Popen(cpCmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> (cOut, cErr) = cProc.communicate()
>>> cExitCode = cProc.wait()
>>> cExitCode
1
>>> print(cErr.decode('ascii').rstrip())
cp: /Volumes/Data/abcd/aging_logs/boards/*.log: No such file or directory
我想将其过滤到
scp user@host:dir_path/*.log local_dir
我想要的是删除特定col_a|col_b| col_c
1 te 8/15/2018
1 qe 8/17/2018
1 qh 8/16/2018
2 wa 8/17/2018
3 gs 8/17/2018
值的所有重复条目,并保留具有最新col_a|col_b| col_c
1 qe 8/17/2018
2 wa 8/17/2018
3 gs 8/17/2018
日期的条目。
我将如何完成?