行合并到datframe中的字符串?

时间:2019-04-05 16:44:26

标签: python pandas dataframe data-science

我有一个电话目录,在不同的行上存储部门,标题,电子邮件和分机号,它们的共同点是名字和姓氏。我已经将“姓氏和姓氏”作为关键字进行了组合,并且希望将这些行合并到您最终会得到带有“名称”,“标题”,“部门”,“电子邮件”和“扩展名”的一行。

我曾尝试为每个键创建一个字典,但是实际合并没有任何运气。这就是我编写代码的地方。我必须先清理数据才能获得适当的列。

该表如下所示:

graphics.off()
b = boxplot(mpg~cyl, mtcars, names = c("four", "six", "eight"), xaxt = "n")
axis(side = 1, at = seq_along(b$names), labels = b$names, tick = FALSE)
LastName  FirstName  Department Title   Extension Email           Key
Doe       Jane       HR         Officer 0000                      Jane Doe
Doe       Jane       HR         Officer           jdoe@email.com  Jane Doe
df = pd.read_excel("Directory.xlsx")
df = df.drop(columns = ["group_name","editable","id","contact_type","id2","account_id","server_uuid","picture",
             "dial_prefix","name","label","id3","transfer_name","value","key","primary","label4","id5",
             "type","display","group_name6"])

df = df.rename(index = str, columns = {"last_name":"Last Name","first_name":"First Name","location":"Department",
               "title":"Title","dial":"Extension","address":"Email"})

df["Key"] = df["First Name"].map(str) + " " + df["Last Name"].map(str)

1 个答案:

答案 0 :(得分:1)

首先,我们使用DataFrame.replaceNaN替换空格。然后使用DataFrame.groupby并将fillna应用于方法backfillforwardfill来填充您的空白处。最后,我们可以使用drop_duplicates根据需要获取单行。

df['Key'] = df['FirstName'] + ' ' + df['LastName']
df.replace('', np.NaN, inplace=True)
df = df.groupby('Key').apply(lambda x: x.fillna(method='ffill').fillna(method='bfill')).drop_duplicates()

print(df)
  LastName FirstName Department    Title Extension           Email       Key
0      Doe      Jane         HR  Officer      0000  jdoe@email.com  Jane Doe