我想让if语句显示重复的所有 REF_INT 我试过这个:
(df_picru['REF_INT'].value_counts()==1)
它向我显示所有值的真或假,但我不想做这样的事情:
if (df_picru['REF_INT'].value_counts()==1)
print "df_picru['REF_INT']"
答案 0 :(得分:3)
In [28]: df_picru['new'] = \
df_picru['REF_INT'].duplicated(keep=False) \
.map({True:'duplicates',False:'unique'})
In [29]: df_picru
Out[29]:
REF_INT new
0 1 unique
1 2 duplicates
2 3 unique
3 8 duplicates
4 8 duplicates
5 2 duplicates
答案 1 :(得分:2)
我认为您需要duplicated
表示布尔掩码和新列numpy.where
:
mask = df_picru['REF_INT'].duplicated(keep=False)
样品:
df_picru = pd.DataFrame({'REF_INT':[1,2,3,8,8,2]})
mask = df_picru['REF_INT'].duplicated(keep=False)
print (mask)
0 False
1 True
2 False
3 True
4 True
5 True
Name: REF_INT, dtype: bool
df_picru['new'] = np.where(mask, 'duplicates', 'unique')
print (df_picru)
REF_INT new
0 1 unique
1 2 duplicates
2 3 unique
3 8 duplicates
4 8 duplicates
5 2 duplicates
如果需要检查至少一个,unique
值需要any
才能将boolean mask
- array
转换为标量True
或False
:< / p>
if mask.any():
print ('at least one unique')
at least one unique
答案 2 :(得分:1)
另一种使用groupby的解决方案。
#groupby REF_INT and then count the occurrence and set as duplicate if count is greater than 1
df_picru.groupby('REF_INT').apply(lambda x: 'Duplicated' if len(x)> 1 else 'Unique')
Out[21]:
REF_INT
1 Unique
2 Duplicated
3 Unique
8 Duplicated
dtype: object
如果你做了一个小改动,value_counts实际上可以工作:
df_picru.REF_INT.value_counts()[lambda x: x>1]
Out[31]:
2 2
8 2
Name: REF_INT, dtype: int64