我有一堆记录,每个记录都标有一个簇值。
原始数据框, df :
+-------------+---------+
| measurement | cluster |
+-------------+---------+
| M1 | 6 |
| M2 | 6 |
| M3 | 6 |
| M4 | 12 |
| M5 | 12 |
| M6 | 12 |
| M7 | 2 |
| M8 | 9 |
| M9 | 9 |
| M10 | 9 |
| M11 | 9 |
+-------------+---------+
如何在分配给群集值不等于上一个或下一个的“ x”行时,基于当前群集的值是否等于上一个和下一个来重命名群集? >
所需的 df :
+-------------+---------+-------------+
| measurement | cluster | new_cluster |
+-------------+---------+-------------+
| M1 | 6 | 1 |
| M2 | 6 | 1 |
| M3 | 6 | 1 |
| M4 | 12 | 2 |
| M5 | 12 | 2 |
| M6 | 12 | 2 |
| M7 | 2 | x |
| M8 | 9 | 3 |
| M9 | 9 | 3 |
| M10 | 9 | 3 |
| M11 | 9 | 3 |
+-------------+---------+-------------+
答案 0 :(得分:4)
将pandas.factorize
用于通过掩码过滤的值:
m = df['cluster'].ne(df['cluster'].shift()).cumsum().duplicated(keep=False)
df.loc[m, 'new_cluster'] = pd.factorize(df.loc[m, 'cluster'])[0] + 1
print (df)
measurement cluster new_cluster
0 M1 6 1.0
1 M2 6 1.0
2 M3 6 1.0
3 M4 12 2.0
4 M5 12 2.0
5 M6 12 2.0
6 M7 2 NaN
7 M8 9 3.0
8 M9 9 3.0
9 M10 9 3.0
10 M11 9 3.0
如果要将NaN
替换为x
:
df['new_cluster'] = df['new_cluster'].fillna('x')
print (df)
measurement cluster new_cluster
0 M1 6 1
1 M2 6 1
2 M3 6 1
3 M4 12 2
4 M5 12 2
5 M6 12 2
6 M7 2 x
7 M8 9 3
8 M9 9 3
9 M10 9 3
10 M11 9 3
布尔掩码的详细信息-首先为连续值创建辅助程序Series
,然后使用keep ='False'用duplicated
掩码以返回所有重复项:
print (df['cluster'].ne(df['cluster'].shift()).cumsum())
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 4
8 4
9 4
10 4
Name: cluster, dtype: int32
print (m)
0 True
1 True
2 True
3 True
4 True
5 True
6 False
7 True
8 True
9 True
10 True
Name: cluster, dtype: bool
答案 1 :(得分:2)
df = pd.DataFrame(dict(
measurement=[f"M{i}" for i in range(1, 12)],
cluster=[*map(int, '6 6 6 12 12 12 2 9 9 9 9'.split())]
))
c = df.cluster.values
m0 = c[:-1] == c[1:]
# Test if equal to prior or next
mask = np.append(False, m0) | np.append(m0, False)
f, u = pd.factorize(c[mask])
new = np.array(list('x' * len(c)), dtype=object)
new[mask] = f + 1
df.assign(new_cluster=new)
measurement cluster new_cluster
0 M1 6 1
1 M2 6 1
2 M3 6 1
3 M4 12 2
4 M5 12 2
5 M6 12 2
6 M7 2 x
7 M8 9 3
8 M9 9 3
9 M10 9 3
10 M11 9 3