我需要清理Pandas DataFrame中的一些数据并为此苦苦挣扎。
样本数据:
Date | ID | Name | Address
-----------------------------------------------------------------------------------------------
1-4-1987 | 124578 | T.Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863
23-6-1990 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury
12-5-1960 | 746732 | Earline Schulist | 57367 Alfredo Vista East Bertaburgh
9-9-2010 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205
27-12-2017 | 124578 | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo
我想做的就是这个。按ID分组,从最近的日期获取名称,并获取最长的地址字符串。将这些用于所有出现的ID(在两个新列中:Name_new
和Address_New
)。请在下面找到所需的样本:
Date | ID | Name | Address | Name_New | Address_New
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
27-12-2017 | 124578 | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863
1-4-1987 | 124578 | T. Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863 | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863
23-6-1990 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205
9-9-2010 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205
12-5-1960 | 746732 | Earline Schulist | 57367 Alfredo Vista East Bertaburgh | Earline Schulist | 57367 Alfredo Vista East Bertaburgh
我已经尝试过了,但是无法将其组合起来以获得期望的结果。
def f1(s):
return max(s, key=len)
df_new = df['New_Address'] = df.groupby('ID').agg({'Address': f1})
df_new = df[df.groupby('ID').Date.transform('max') == df['Date']]
特别感谢您的帮助。