需要帮助分类至少一个女性出现在一系列ID中的位置

时间:2019-07-02 09:35:23

标签: python pandas

我有一个ID列表以及性别信息。我需要对至少有一位女性出现的ID进行分类。以下是供参考的数据。

ID  Gender
1   Female
1   Female
2   Male
2   Male
3   Female
3   Male
4   Male
4   Male
4   Male
4   Male
4   Female
5   Female
5   Male
5   Female
6   Male
6   Male
6   Male
6   Male
7   Female
8   Male
8   Male
9   Male
10  Male
10  Male
11  Male
11  Female
13  Male
14  Male

我试图创建两列,如果ID相同,则创建一列,并检查另一列是否包含Female。基于两列结果,将创建输出。但是我认为他们会是更好的方法。

 import re,os, subprocess,  pandas as pd, numpy as np
    data = pd.read_excel(r"C:\Analytics\TA Dashboard\test\test.xlsx")
    data['match1'] =data['Reference ID'].eq(data['Reference ID'].shift())
    data['match2'] =data.eq('Female').any(axis=1)

基于ID和性别的组合,输出必须为“是”或“否”,对于相同的ID,如果任何ID上都存在“女性”,则所有ID应该为“是”,否则为“否”。

ID  Gender  OUTPUT
1   Female  Yes
1   Female  Yes
2   Male    NO
2   Male    NO
3   Female  Yes
3   Male    Yes
4   Male    Yes
4   Male    Yes
4   Male    Yes
4   Male    Yes
4   Female  Yes
5   Female  Yes
5   Male    Yes
5   Female  Yes
6   Male    NO
6   Male    NO
6   Male    NO
6   Male    NO
7   Female  YES
8   Male    NO
8   Male    NO
9   Male    NO
10  Male    NO
10  Male    NO
11  Male    Yes
11  Female  Yes
13  Male    NO
14  Male    NO

2 个答案:

答案 0 :(得分:1)

groupby检查{-# LANGUAGE OverloadedLists #-} test :: XYZs test = [XYZ 1, XYZ 2] #=========================================================================================================# # INCLUDE DLIB LIBS # #=========================================================================================================# INCLUDEPATH += "F:\examinator\dlib-19.17_no_blas\install\include" LIBS += -L"F:\examinator\dlib-19.17_no_blas\build" LIBS += -ldlib LIBS += -luser32 -lws2_32 -lgdi32 -lcomctl32 -limm32 -lwinmm #=========================================================================================================# # INCLUDE LIBPNG LIBS # #=========================================================================================================# INCLUDEPATH += "C:\Program Files (x86)\libpng\include" LIBS += "C:\Program Files (x86)\libpng\lib\libpng.a" #=========================================================================================================# # INCLUDE LIBJPEG LIBS # #=========================================================================================================# INCLUDEPATH += "C:\Program Files (x86)\libjpeg\include" LIBS += "C:\Program Files (x86)\libjpeg\lib\liblibjpeg.a" #=========================================================================================================# # INCLUDE ZLIB LIBS # #=========================================================================================================# INCLUDEPATH += "C:\Program Files (x86)\zlib\include" LIBS += "C:\Program Files (x86)\zlib\lib\libzlibstatic.a" anyGender的位置:

Female

答案 1 :(得分:0)

我在这里遇到了另一个问题...如果我必须在一个附加列“状态”上应用过滤器,然后应用以上逻辑,而不从数据集中删除过滤的行,该怎么办?

下面是数据,在这里我需要过滤状态不等于xyz和xy的地方,然后才应应用上面的逻辑。记住,我也不想从主数据源中删除筛选出的行。

ID性别状态 1女xyz 1女xyz 2男xyz 2男xy 3女x 3男y 4男xyz 4男xy 4男xy 4男xy 4女xab 5女xac 5男xy 5女xyz 6男xyz 6男xy 6男xy 6男xy 7女xyc 8男xy 8男xyz 9男xy