我有2列,我希望使用A列进行分组,然后检查B列中该组是否存在三个不同的值;如果没有,则删除整行。
Please check the image for input and output required
在上面的输出中,我必须删除ABC,因为它只有1和2,而我需要至少有1,2和3一次
ColA ColB
ABC 1
ABC 2
XYZ 1
PQR 1
PQR 2
XYZ 2
XYZ 3
PQR 3
PQR 2
XYZ 1
ABC 2
输出
ColA ColB
XYZ 1
2
3
PQR 1
2
3
我尝试使用for,但不起作用
答案 0 :(得分:0)
data = [ ('ABC', 1),
('ABC', 2),
('XYZ', 1),
('PQR', 1),
('PQR', 2),
('XYZ', 2),
('XYZ', 3),
('PQR', 3),
('PQR', 2),
('XYZ', 1),
('ABC', 2)
]
#create set dataframe
data_df = pd.DataFrame(list(data), columns=['col_a', 'col_b'], )
#pull unique column b values, pull a list of all unique integer values. This will be used to figured out which col A values does not contain all of the values
dfSetB = set(list(data_df['col_b']))
#sort by column a
dfSorted = data_df.sort_values('col_a')
#pull unique values from a, will need this for the loop that will filter the data by col A values
dfColAValues = set(list(data_df['col_a']))
#check all col a values to see if it contains all unique values from col b
inclusion_list = []
#work your way through each unique col A entry
for col_item in dfColAValues:
#filter the data-set based on col_item value
dfTemp= data_df.loc[data_df['col_a'] == col_item ]
#pull list of unique col B values for that specific col A entry
dfSetTemp = set(list(dfTemp['col_b']))
#check and see if the list of unique col B values for the entire data-set matches all of the unique col B values for that specific col A entry and if it does, append it to the inclusion list
if dfSetTemp == dfSetB:
inclusion_list.append(col_item)
#filter data to only include col a values that contain all unique values from col b and drop duplicates
dfFinal= data_df.loc[data_df['col_a'].isin(inclusion_list)].drop_duplicates(subset=None, keep='first', inplace=False).sort_values(['col_a', 'col_b'])
输出:
col_a col_b
3 PQR 1
4 PQR 2
7 PQR 3
2 XYZ 1
5 XYZ 2
6 XYZ 3