熊猫无法使用重复轴计算isin

时间:2019-02-05 06:24:32

标签: pandas dataframe

我的数据框是这样的:

public static void Initialize(bool isEditor = false)
{
    if (isEditor)
    {
        FirebaseApp firebaseApp = FirebaseApp.Create(
            FirebaseApp.DefaultInstance.Options, 
            "FIREBASE_EDITOR");

        firebaseApp.SetEditorDatabaseUrl("https://project.firebaseio.com/");

        FirebaseApp.CheckAndFixDependenciesAsync().ContinueWith(task =>
        {
            if (task.Result == DependencyStatus.Available)
            {
                database = FirebaseDatabase.GetInstance(firebaseApp).RootReference;
                storage = FirebaseStorage.GetInstance(firebaseApp).RootReference;
                auth = FirebaseAuth.GetAuth(firebaseApp);
            }
            else
            {
                Debug.LogError(
                    "Could not resolve all Firebase dependencies: " + task.Result);
            }
        });
    }
    else
    {
        FirebaseApp.DefaultInstance.SetEditorDatabaseUrl("https://project.firebaseio.com/");

        database = FirebaseDatabase.DefaultInstance.RootReference;
        storage = FirebaseStorage.DefaultInstance.RootReference;
        auth = FirebaseAuth.DefaultInstance;
    }

    IsInitialized = true;
}

我通过执行以下操作获取了此数据框的子集(具有最新时间戳的用户):

             userid           codeassigned         timestamp
15           553938              M1           1499371200000
15390        527638              M2           1599731200000
15389        521638              M2           1399901200000
15388        521638              M3           1439841200000
15387        553938              M4           1499521200000

现在,我希望来自主数据帧的所有行(userid,timestamp)位于subset_df中(可以有多个行具有相同的[userid,timestamp],但分配了不同的代码);我正在为此:

df = df.sort_values('timestamp', ascending=False)
mask = df.duplicated('userid')
subset_df = df[~mask]

但是,我遇到此错误:

subset_df[['userid', 'timestamp']].isin(df)

知道我在做什么错吗?

1 个答案:

答案 0 :(得分:2)

您需要merge来进行带有过滤子集的内部联接:

subset_df = df.loc[~mask, ['userid', 'timestamp']]

df = subset_df.merge(df)

或者:

df = subset_df[['userid', 'timestamp']].merge(df)