Question

我正在定义一个函数，该函数将应用于我的数据框中的每一行，该函数对集合中的每个ID的“代码”列中的唯一代码进行计数。我拥有的代码可以运行，但是速度非常慢，而且我使用的是大型数据集。我正在寻找可以加快操作速度的另一种方法。

from datetime import timedelta as td
import pandas as pd

df['Trailing_12M'] = df['Date'] - td(365) #current date - 1 year as new column

def Unique_Count(row):
    """Creating a new df for each id and returning unique count to every row in original df"""
    temp1 = np.array(df['ID'] == row['ID'])
    temp2 = np.array(df['Date'] <= row['Date'])
    temp3 = np.array(df['Date'] >= row['Trailing_12M'])
    temp4 = np.array(temp1 & temp2 & temp3)
    df_Unique_Code_Count = np.array(df[temp4].Code.nunique())
    return df_Unique_Code_Count


df['Unique_Code_Count'] = df.apply(Unique_Count, axis=1)

移动唯一计数计算熊猫数据框

0 个答案: