Question

我试图查看Pandas DataFrame的特定列，并尝试执行以下操作：

col0 col1   col2
int0 float0 str0
int1 float1 str0
int2 float2 str1

我试图仅在该列上逐行遍历DataFrame，但这似乎效率不高。Lambdas是一种选择，但我不确定是否有类似“列表理解”的功能DataFrame列（据我所知是Pandas系列）。

如果，假设DataFrame被定义为df，那么我想做一些事情，例如：

for row in df:
    if df['col2'] == str0:
        # do some calculation for all str0 types
    elif df['col2'] == str1:
        # do another calculation for all str1 types
    else:
        # do another calculation for all other types (which are actually str2 types implied by the if-elif-else statement)

我正在尝试重复此操作，以获得一个单独的NumPy矩阵，该矩阵将创建一个与DataFrame的相应行相对应的NumPy行。通过检查“ col2”的条件，将根据其值执行单独的计算，并将生成相应NumPy数组的行。

请告诉我是否有一种有效的方法（就规范，Python编码以及时间和内存效率而言）！所有帮助将不胜感激。

Answer 1

这更像是np.select问题

condition = [df['col2'] == str0, df['col2'] == str1...]
Target = [function1 , function2...]

df['NewCol'] = np.select(condition , Target , default = np.nan )

Answer 2

从维护人员的角度来看，最简单的事情就是创建一个功能，例如：

def fn(x):
    if x < 5:
        return x * 2
    else:
        return x * 3

然后将其应用于您的列：

df["col2"].apply(fn)

Answer 3

您可以首先分离出与col2列的每个值对应的行。

>>> df = pd.DataFrame(dict(col0=[0,1,2], col1=[3.0, 4.0, 5.0], col2=['str0', 'str0', 'str1']))
>>> col2_elm_to_df = {col2_elm: group_df for col2_elm, group_df in df.groupby('col2')}
>>> col2_elm_to_df
{
    'str0':          col0  col1  col2
               0     0     3.0   str0
               1     1     4.0   str0, 

    'str1':          col0  col1  col2
               2     2     5.0   str1
}

寻找一种有效地对DataFrame的某些列进行计算的方法

3 个答案: