我正在从一个 csv 文件创建一个初始 df,如下所示:
knobs_df = pd.read_csv(knobs_container)
name type values
0 algorithm string one;two;three
1 threads int32_t 1;2;3;4;5;6;7;8;9;10;11;12;13;14;15
对于每一行,我将类型列和值列作为字典提取到 k_values
和 k_type
中。
k_values = {}
k_types = {}
for row in knobs_df.itertuples(index=False):
k_values[row[0]] = row[2].split(';')
k_types[row[0]] = row[1]
{'algorithm': ['one', 'two', 'three'], 'threads': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15']}
{'algorithm': 'string', 'threads': 'int32_t'}
我从 k_values
字典生成一个包含所有可能组合的完整网格。
algorithm threads
0 one 1
1 two 1
2 three 1
3 one 2
4 two 2
.. ... ...
88 two 14
89 three 14
90 one 15
91 two 15
92 three 15
具有如下所示的约束列表(Python 表达式)
['threads < 20', 'algorithm != "two"']
我想使用 query
中的 pandas.DataFrame
方法过滤全网格数据框。有没有办法根据 k_types
字典为每一列分配相应的 dtype?我需要这样做是因为每个列都可能有一个独立的类型,例如,查询方法无法过滤 'threads' 列,因为所有列在创建期间默认推断为 'str'。问题是因为类型本来就是C++数据类型,不知道有没有办法实现。
可能的 k_types 是:
[string, short int, int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, uint64_t, char, int, long int, long long int, int_fast8_t, int_fast16_t, int_fast32_t, int_fast64_t, int_least8_t, int_least_16_t, int_least32_t, int_least64_t, unsigned short int, unsigned char, unsigned int, unsigned long int, unsigned long long int, uint_fast8_t, uint_fast16_t, uint_fast32_t, uint_fast64_t, uint_least8_t, uint_least16_t, uint_least32_t, uint_least64_t, intmax_t, intptr_t, uintmax_t, uintptr_t, float, double, long double]
答案 0 :(得分:0)
由于一些误解,我设法找到了一个不完整的解决方案。请让我知道如何使此解决方案满足您的需求:
t_df = df.T
names = t_df.loc['name']
dtypes = t_df.loc['type']
t_df.columns = names
t_df = t_df.iloc[2:]
dtype_conv = {'string':str,'int32_t':int}
for dtype,name in zip(dtypes,names):
t_df[name] = t_df[name].str.split(';')
t_df=t_df.explode(name)
t_df[name] =t_df[name].astype(dtype_conv[dtype])
t_df.sort_values('threads').reset_index(drop=True)
输出:
algorithm threads
0 one 1
1 two 1
2 three 1
3 one 2
4 two 2
5 three 2
6 one 3
7 two 3
...