我有一个电话目录,在不同的行上存储部门,标题,电子邮件和分机号,它们的共同点是名字和姓氏。我已经将“姓氏和姓氏”作为关键字进行了组合,并且希望将这些行合并到您最终会得到带有“名称”,“标题”,“部门”,“电子邮件”和“扩展名”的一行。
我曾尝试为每个键创建一个字典,但是实际合并没有任何运气。这就是我编写代码的地方。我必须先清理数据才能获得适当的列。
该表如下所示:
graphics.off()
b = boxplot(mpg~cyl, mtcars, names = c("four", "six", "eight"), xaxt = "n")
axis(side = 1, at = seq_along(b$names), labels = b$names, tick = FALSE)
LastName FirstName Department Title Extension Email Key
Doe Jane HR Officer 0000 Jane Doe
Doe Jane HR Officer jdoe@email.com Jane Doe
df = pd.read_excel("Directory.xlsx")
df = df.drop(columns = ["group_name","editable","id","contact_type","id2","account_id","server_uuid","picture",
"dial_prefix","name","label","id3","transfer_name","value","key","primary","label4","id5",
"type","display","group_name6"])
df = df.rename(index = str, columns = {"last_name":"Last Name","first_name":"First Name","location":"Department",
"title":"Title","dial":"Extension","address":"Email"})
df["Key"] = df["First Name"].map(str) + " " + df["Last Name"].map(str)
答案 0 :(得分:1)
首先,我们使用DataFrame.replace
用NaN
替换空格。然后使用DataFrame.groupby
并将fillna
应用于方法backfill
和forwardfill
来填充您的空白处。最后,我们可以使用drop_duplicates
根据需要获取单行。
df['Key'] = df['FirstName'] + ' ' + df['LastName']
df.replace('', np.NaN, inplace=True)
df = df.groupby('Key').apply(lambda x: x.fillna(method='ffill').fillna(method='bfill')).drop_duplicates()
print(df)
LastName FirstName Department Title Extension Email Key
0 Doe Jane HR Officer 0000 jdoe@email.com Jane Doe