pandas数据框-具有选定起始字符串和指定列的列

时间:2018-10-02 06:43:40

标签: pandas

在下面的给定数据框中,我要清理计数器(例如MEM_TRANS_RETIRED),并将列load,rps,th95

添加到数据框中
       load        rps     th95    energy     ...       MEM_TRANS_RETIRED-34  PERF_COUNT_HW_CACHE_L1D-34  PERF_COUNT_HW_CACHE_L1I-34     map_freq
0    500.0k  346222.62    12.62   7270.22     ...                  154287.14                   591053.74                2.918521e+07   6C-1.70GHz
1    400.0k  402628.34     2.25  12026.40     ...                  189915.07                   627043.91                2.867945e+07  10C-2.10GHz
2    500.0k  283508.27    15.52   5662.74     ...                  140790.31                  1431892.98                4.253950e+07   6C-1.30GHz

这就是我的做法

self.unique_counters = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]
for counter in self.unique_counters:
 new = self.dfile.loc[:, self.dfile.columns.str.startswith(counter)]

但是,这仅提供选定的列,而没有上面提到的其他列。

     PERF_COUNT_HW_CPU_CYCLES-0  PERF_COUNT_HW_CPU_CYCLES-2             ...               PERF_COUNT_HW_CPU_CYCLES-32  PERF_COUNT_HW_CPU_CYCLES-34
0                  6.020913e+08                6.021277e+08             ...                              5.109342e+06                 2.556039e+06
1                  4.781879e+08                4.783621e+08             ...                              3.095814e+06                 2.795868e+06
2                  4.841784e+08                4.844846e+08             ...                              2.389396e+06                 5.550159e+06

如何同时获得带有选定起始字符串和一些指定列的计数器 这是预期的输出

       load        rps     th95    energy   PERF_COUNT_HW_CPU_CYCLES-0  PERF_COUNT_HW_CPU_CYCLES-2             ...               PERF_COUNT_HW_CPU_CYCLES-32  PERF_COUNT_HW_CPU_CYCLES-34
0    500.0k  346222.62    12.62   7270.22   6.020913e+08                6.021277e+08             ...                              5.109342e+06                 2.556039e+06
1    400.0k  402628.34     2.25  12026.40   4.781879e+08                4.783621e+08             ...                              3.095814e+06                 2.795868e+06
2    500.0k  283508.27    15.52   5662.74   4.841784e+08                4.844846e+08             ...                              2.389396e+06                 5.550159e+06

1 个答案:

答案 0 :(得分:1)

我相信您需要一个新的DataFrame

L = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]

new = self.dfile.loc[:, self.dfile.columns.str.startswith(tuple(L))]

df = pd.concat([self.dfile[['load','rps','th95','energy']], new], axis=1)

或者在列表理解中创建list of DataFrame

self.unique_counters = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]
dfs = [self.dfile.loc[:, self.dfile.columns.str.startswith(counter)] 
       for counter in self.unique_counters]

df = pd.concat([self.dfile[['load','rps','th95','energy']], dfs], axis=1)