计算熊猫中模式匹配的总和

时间:2020-11-10 09:18:14

标签: python pandas dataframe

你好,我有一个df,例如:

Groups COL1
G1   Seq:1
G1   Seq:2
G1   Seq_1
G1   Seq:4
G2   Seq_2
G2   Seq_3
G2   Seq_4
G3   Seq:5
G3   Seq:6
G4   Seq:7
G4   Seq_5

我想数一下:

  1. 只有“:” = 1(G3)的Nb组
  2. 不仅具有“:” = 2(G1和G4)的Nb组
  3. 没有任何“:” = 1(G2)的Nb组

有人不知道吗?我想我应该起诉re.sub并在熊猫中对每个Groups求和?

2 个答案:

答案 0 :(得分:2)

使用Series.str.contains作为掩码,然后将numpy.setdiff1dDataFrame.loc过滤值与~或掩码的倒置掩码进行比较:

m = df['COL1'].str.contains(':')

a = np.setdiff1d(df['Groups'], df.loc[~m, 'Groups']).tolist()
print (a)
['G3']

c = np.setdiff1d(df['Groups'], df.loc[m, 'Groups']).tolist()
print (c)
['G2']

b = np.setdiff1d(df.loc[~m, 'Groups'], c).tolist()
print (b)
['G1', 'G4']

计数的Anf获取列表的长度:

print (len(a))
print (len(b))
print (len(c))

答案 1 :(得分:2)

您可以使用pd.Series.str.contains进行计数,然后使用GroupBy.allGroupBy.any

om = df['COL1'].str.contains(':')

one = om.groupby(df['Groups']).all().sum() # 1
two = om.groupby(df['Groups']).any().sum() - one # 2 
# minus one because `any` counts all Trues too so we need 
# subtract groups with all Trues.
three = (~om).groupby(df['Groups']).all().sum() # 1