在下面的给定数据框中,我要清理计数器(例如MEM_TRANS_RETIRED),并将列load,rps,th95
load rps th95 energy ... MEM_TRANS_RETIRED-34 PERF_COUNT_HW_CACHE_L1D-34 PERF_COUNT_HW_CACHE_L1I-34 map_freq
0 500.0k 346222.62 12.62 7270.22 ... 154287.14 591053.74 2.918521e+07 6C-1.70GHz
1 400.0k 402628.34 2.25 12026.40 ... 189915.07 627043.91 2.867945e+07 10C-2.10GHz
2 500.0k 283508.27 15.52 5662.74 ... 140790.31 1431892.98 4.253950e+07 6C-1.30GHz
这就是我的做法
self.unique_counters = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]
for counter in self.unique_counters:
new = self.dfile.loc[:, self.dfile.columns.str.startswith(counter)]
但是,这仅提供选定的列,而没有上面提到的其他列。
PERF_COUNT_HW_CPU_CYCLES-0 PERF_COUNT_HW_CPU_CYCLES-2 ... PERF_COUNT_HW_CPU_CYCLES-32 PERF_COUNT_HW_CPU_CYCLES-34
0 6.020913e+08 6.021277e+08 ... 5.109342e+06 2.556039e+06
1 4.781879e+08 4.783621e+08 ... 3.095814e+06 2.795868e+06
2 4.841784e+08 4.844846e+08 ... 2.389396e+06 5.550159e+06
如何同时获得带有选定起始字符串和一些指定列的计数器 这是预期的输出
load rps th95 energy PERF_COUNT_HW_CPU_CYCLES-0 PERF_COUNT_HW_CPU_CYCLES-2 ... PERF_COUNT_HW_CPU_CYCLES-32 PERF_COUNT_HW_CPU_CYCLES-34
0 500.0k 346222.62 12.62 7270.22 6.020913e+08 6.021277e+08 ... 5.109342e+06 2.556039e+06
1 400.0k 402628.34 2.25 12026.40 4.781879e+08 4.783621e+08 ... 3.095814e+06 2.795868e+06
2 500.0k 283508.27 15.52 5662.74 4.841784e+08 4.844846e+08 ... 2.389396e+06 5.550159e+06
答案 0 :(得分:1)
我相信您需要一个新的DataFrame
:
L = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]
new = self.dfile.loc[:, self.dfile.columns.str.startswith(tuple(L))]
df = pd.concat([self.dfile[['load','rps','th95','energy']], new], axis=1)
或者在列表理解中创建list of DataFrame
:
self.unique_counters = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]
dfs = [self.dfile.loc[:, self.dfile.columns.str.startswith(counter)]
for counter in self.unique_counters]
df = pd.concat([self.dfile[['load','rps','th95','energy']], dfs], axis=1)