我正在尝试在join_key中执行以下逻辑。
date + book + bdr + COALECSE(cusip,isin,deal,id)
+------------+------+------+-----------+--------------+------+------------+----------------------------+
| endOfDay | book | bdr | cusip | isin | Deal | Id | join_key |
+------------+------+------+-----------+--------------+------+------------+----------------------------+
| 31/10/2019 | 15 | ITOR | 371494AM7 | US371494AM77 | 161 | 8013210731 | 20191031|15|ITOR|371494AM7 |
| 31/10/2019 | 15 | ITOR | | | | 8011898573 | 20191031|15|ITOR| |
| 31/10/2019 | 15 | ITOR | | | | 8011898742 | 20191031|15|ITOR| |
| 31/10/2019 | 15 | ITOR | | | | 8011899418 | 20191031|15|ITOR| |
+------------+------+------+-----------+--------------+------+------------+----------------------------+
我正在尝试使用:
df['join_key'] = ("20191031|" + df['book'].astype('str') + "|" + df['bdr'] + "|" + df[['cusip', 'isin', 'Deal', 'Id']].bfill(1)['cusip'].astype(str))
也尝试过:
df['position_join_key'] = "20191031|" + df['book'].astype('str') + "|" + df['bdr'] + "|" + df['cusip'].fillna(df['isin']).fillna(df['Deal']).fillna(df['Id']).astype('str')
由于某种原因,此代码未将Id
作为密钥的一部分。
例如,在第二行中,我应该得到20191031|15|ITOR|8011898573
。
如果它有帮助,也来自我使用na_filter = False
示例输入:
+------------+------+------+-----------+-------------+------+------------+
| endOfDay | book | bdr | cusip | isin | Deal | Id |
+------------+------+------+-----------+-------------+------+------------+
| 31/10/2019 | 15 | ITOR | 371494AM7 | | 161 | 8013210731 |
| 31/10/2019 | 15 | ITOR | | 3.16248E+11 | | 8011898573 |
| 31/10/2019 | 15 | ITOR | | | 352 | 8011898742 |
| 31/10/2019 | 15 | ITOR | | | | 8011899418 |
+------------+------+------+-----------+-------------+------+------------+
示例输出:
+----------------------------+
| join_key |
+----------------------------+
| 43769|15|ITOR|371494AM7 |
| 43769|15|ITOR|316247735264 |
| 43769|15|ITOR|352 |
| 43769|15|ITOR|8011899418 |
+----------------------------+
答案 0 :(得分:3)
我们可以通过以下一般方式解决您的问题:
temp
的临时列,该列是回填的值。bdr
列之后插入该列datetime
'|'.join
的前4列并创建join_key
通知:步骤3我添加内容是为了使您的代码通用,因此我们不会像您自己那样对20191031
进行硬编码。
s = df[['cusip', 'isin', 'Deal', 'Id']].replace('', np.NaN).bfill(axis=1).iloc[:, 0]
df.insert(3, 'temp', s)
df['endOfDay'] = pd.to_datetime(df['endOfDay']).dt.strftime('%Y%m%d')
df['join_key'] = df.iloc[:, :4].apply(lambda x: '|'.join(x.astype(str).to_numpy()), axis=1)
df = df.drop(columns='temp')
endOfDay book bdr cusip isin Deal Id join_key
0 20191031 15 ITOR 371494AM7 US371494AM77 161 8013210731 20191031|15|ITOR|371494AM7
1 20191031 15 ITOR 8011898573 20191031|15|ITOR|8011898573
2 20191031 15 ITOR 8011898742 20191031|15|ITOR|8011898742
3 20191031 15 ITOR 8011899418 20191031|15|ITOR|8011899418