您知道有一种方法% Input
x = [1 2 3 4 5]'
y = [6 7 8 9 10]'
% Position
pos = 8;
% Add some code for checking numel(x) >= pos here...
% Output
z = [x; NaN(pos-numel(x)-1, 1); y]
x =
1
2
3
4
5
y =
6
7
8
9
10
z =
1
2
3
4
5
NaN
NaN
6
7
8
9
10
可以在列中查找重复项,但我需要的是知道我的数据按日期排序的最后一个重复元素。
这是列.duplicated
的预期结果Last_dup
:
Policy_id
提前感谢您的帮助和支持!
答案 0 :(得分:2)
将Series.duplicated
或DataFrame.duplicated
与指定的列和参数keep='last'
一起使用,然后将True/False
到1/0
的映射中将倒置掩码转换为整数或使用{{3 }}:
df['Last_dup1'] = (~df['Policy_id'].duplicated(keep='last')).astype(int)
df['Last_dup1'] = np.where(df['Policy_id'].duplicated(keep='last'), 0, 1)
或者:
df['Last_dup1'] = (~df.duplicated(subset=['Policy_id'], keep='last')).astype(int)
df['Last_dup1'] = np.where(df.duplicated(subset=['Policy_id'], keep='last'), 0, 1)
print (df)
Id Policy_id Start_Date Last_dup Last_dup1
0 0 b123 2019/02/24 0 0
1 1 b123 2019/03/24 0 0
2 2 b123 2019/04/24 1 1
3 3 c123 2018/09/01 0 0
4 4 c123 2018/10/01 1 1
5 5 d123 2017/02/24 0 0
6 6 d123 2017/03/24 1 1
答案 1 :(得分:0)
也可以通过下面提到的方式(不使用Series.duplicated来完成):
dictionary = df[['Id','Policy_id']].set_index('Policy_id').to_dict()['Id']
#here the dictionary values contains the most recent Id's
df['Last_dup'] = df.Id.apply(lambda x: 1 if x in list(dictionary.values()) else 0)