Question

制作DataFrame：

people = ['shayna','shayna','shayna','shayna','john']
dates = ['01-01-18','01-01-18','01-01-18','01-02-18','01-02-18']
places = ['hospital', 'hospital', 'inpatient', 'hospital', 'hospital']
d = {'Person':people,'Service_Date':dates, 'Site_Where_Served':places}
df = pd.DataFrame(d)
df

Person   Service_Date   Site_Where_Served
shayna   01-01-18       hospital 
shayna   01-01-18       hospital 
shayna   01-01-18       inpatient 
shayna   01-02-18       hospital 
john     01-02-18       hospital

我想要做的是计算由Site_Where_Served分组的Person及其Service_Date的唯一对。

预期产出：

Site_Where_Served    Site_Visit_Count
hospital             3
inpatient            1

我的尝试：

df[['Person', 'Service_Date']].groupby(df['Site_Where_Served']).nunique().reset_index(name='Site_Visit_Count')

但是它并不知道如何重置索引。所以，我试着把它排除在外，我意识到它并不是在计算一对独特的“人物”。和＆＃39; Service_Date＆＃39;，因为输出如下所示：

                   Person    Service_Date
Site_Where_Served
hospital              2           2 
inpatient             1           1

Answer 1

`drop_duplicates` `groupby` + `count`

(df.drop_duplicates()
   .groupby('Site_Where_Served')
   .Site_Where_Served.count()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

请注意，count / size之间的一个微小区别是前者不计算NaN条目。

Tuplization，`groupby`和`nunique`

这实际上只是修复了您当前的解决方案，但我不建议这样做，因为它需要更长时间的步骤。首先，对列进行整理，按Site_Where_Served分组，然后计算：

(df[['Person', 'Service_Date']]
   .apply(tuple, 1)
   .groupby(df.Site_Where_Served)
   .nunique()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

Answer 2

在我看来，更好的方法是在使用groupby.size之前删除重复项：

res = df.drop_duplicates()\
        .groupby('Site_Where_Served').size()\
        .reset_index(name='Site_Visit_Count')

print(res)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

Answer 3

也许> InputDT[CountryFreq, + .SD[sample(.N, min(.N, Freq*Sample_Size))], + by=.EACHI, + on=.(Country)] Country ID 1: A 19 2: A 7 3: A 5 4: A 3 5: B 109 6: B 110 7: C 203 8: C 205 9: D 302 10: D 301 > InputDT[CountryFreq, + .SD[sample(.N, min(.N, Freq*Sample_Size))], + by=.EACHI, + on=.(Country)] Country ID 1: A 12 2: A 19 3: A 17 4: A 10 5: B 110 6: B 105 7: C 202 8: C 203 9: D 302 10: D 301 > InputDT[CountryFreq, + .SD[sample(.N, min(.N, Freq*Sample_Size))], + by=.EACHI, + on=.(Country)] Country ID 1: A 9 2: A 7 3: A 19 4: A 6 5: B 106 6: B 108 7: C 205 8: C 201 9: D 302 10: D 301

value_counts

Answer 4

`Counter` 1

pd.Series(Counter(df.drop_duplicates().Site_Where_Served)) \
    .rename_axis('Site_Where_Served').reset_index(name='Site_Visit_Count')

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

`Counter` 2

pd.DataFrame(
    list(Counter(t[2] for t in set(map(tuple, df.values))).items()),
    columns=['Site_Where_Served', 'Site_Visit_Count']
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

由col3

4 个答案:

`drop_duplicates` `groupby` + `count`

Tuplization，`groupby`和`nunique`

`Counter` 1

`Counter` 2

由col3

4 个答案:

drop_duplicates groupby + count

Tuplization，groupby和nunique

Counter 1

Counter 2

`drop_duplicates` `groupby` + `count`

Tuplization，`groupby`和`nunique`

`Counter` 1

`Counter` 2