Question

CaseNumber  Value   Open            crs               
03820567    1   Yes              2375636
03820573    1   Yes               2367131
03820587    1   Yes               2374597
03820598    1   Yes               2367429
03820599    2   Yes               2367131; 2342755
03820619    1   Yes               2377137
03820627    1   Yes                2367429
03820632    1   Yes               2342755

This is my data here in crs column i have to get count of unique values 

My output should be 
        crs               
        2375636
        2367131
        2374597
        2367429
        2342755
        2377137
          crs.count() = 6
I tried code firstly spliting the delimeter ; into next row with casenumber  then i can get unique number count easily but somehow i stuck in it.

[![

] 1] 1这是我使用的代码，但如下所示但是我把

放了出去

    CaseNumber  CRs
0   3820567       [2375636]
1   3820573        [2367131]
2   3820587        [2374597]
3   3820598        [2367429]
4   3820599        [2308266; 2342755]
5   3820619         [2377137]
6   3820627         [2321772

这是我使用的代码，但如下所示但是我把

放了出去

Answer 1

如果您想要的是csr中唯一元素的数量，这是一种方法。您可以先使用str.split，然后从结果中获取列表。然后使用itertools.chain展平列表，将其变成set并取len：

from itertools import chain
len(set(chain(*df.crs.str.split('; ').values.tolist())))
# 6

Answer 2

如果仅需要计算唯一值，请使用set comprehension和split：

out = len(set(y for x in  df.crs.str.split('; ') for y in x))
#alternative
#out = len(set(y for x in  df.crs for y in x.split('; ')))
print (out)
6

如果需要在输出的第一提取列中过滤DataFrame.pop，Series.str.split，DataFrame.stack（对于Series和DataFrame.join进行过滤的DataFrame，以通过{{ 3}}：

s = (df.pop('crs')
       .str.split('; ', expand=True)
       .stack()
       .reset_index(1, drop=True)
       .rename('crs'))

df1 = (df.join(s)
         .drop_duplicates('crs')
         .reset_index(drop=True))

print (df1)
   CaseNumber  Value Open      crs
0     3820567      1  Yes  2375636
1     3820573      1  Yes  2367131
2     3820587      1  Yes  2374597
3     3820598      1  Yes  2367429
4     3820599      2  Yes  2342755
5     3820619      1  Yes  2377137

如何删除高度计;并从列中删除重复的值

2 个答案: