Question

我正在尝试识别每个ID有多少种不同的模式，以及在下面的这个（简化的）数据框中它们是什么？

从数据中我们可以看到：

ID 20的A条纹，然后是B的条纹-> 2种图案

ID 21的条纹为（A，B），然后为C条纹-> 2个图案

我的预期结果是这样的

20：2

21：2

反正我能在熊猫里做到这一点吗？

Answer 1

import pandas as pd
from pandas.compat import StringIO
print(pd.__version__)

csvdata = StringIO("""ID,Items
0,20,A
1,20,A
2,20,B
3,20,B
4,20,B
5,20,B
6,20,A
7,21,A
8,21,B
9,21,A
10,21,B
11,21,C
12,21,C
13,21,C
14,21,C
15,21,A""")

df = pd.read_csv(csvdata)

df['streak_group'] = (df['Items'] != df['Items'].shift()).cumsum()
df = df.groupby(['ID', 'Items', 'streak_group']).size().to_frame()
df.reset_index(inplace=True)
df.columns =['ID', 'Items', 'streak_group',  'streak_size']
df['streak_kind'] = df['Items']+df['streak_size'].apply(str)
df.drop(['streak_group', 'streak_size'], axis=1, inplace=True)
df.drop_duplicates(inplace=True)
print(df)
print(df.groupby('ID')['streak_kind'].value_counts())
print(df['streak_kind'].value_counts())

生产

0.24.2
   ID Items streak_kind
0  20     A          A2
1  20     A          A1
2  20     B          B4
3  21     A          A1
6  21     B          B1
8  21     C          C4
ID  streak_kind
20  A1             1
    A2             1
    B4             1
21  A1             1
    B1             1
    C4             1
Name: streak_kind, dtype: int64
A1    2
C4    1
B4    1
A2    1
B1    1
Name: streak_kind, dtype: int64

Python Pandas：确定列中的数据模式

1 个答案: