在熊猫中合并多个数据

时间:2019-02-26 12:32:35

标签: python pandas anaconda

我有一个这样的DataFrame:

Id      First_name1 first_name2.    first_name3   last_name1 last_name2

1.         Michel.     michelle.         Michele.        Jeremi.        Jeremy
2          Jack.        jack.                Jak.               Jean.           Jean
3.         Dave.        Dav.                Dave              Daniel.        Danielle

如您所见,对于相同的ID,名称写的不一样。我想检查每一行是first_name1 == first_name2还是first_name3。如果相等,则创建一个名为first_name的新列,否则将所有不同的名称设置为first_name1,依此类推……

Id.        First_name.       First_name1.       First_name2.        Last_name1.         Last_name2

1.         Michel.              Michelle.             Michele.                Jeremy.                Jeremi
2.         Jack.                 Jak.                     nan.                       Jean.                   nan
3.         Dave.                 Dav.                    nan.                       Daniel.                Danielle

2 个答案:

答案 0 :(得分:0)

首先,您遍历数据框的行:

for index, row in yourdf.iterrows():

然后为数据框中的每一行比较两个要比较的值:

if row['First_name1'] == row['first_name2']:
    # Create the new column and set its value to first_name
    row['new_column'] = first_name
else:
    # Set each column to the value you want
    row['first_name'] = first_name1
    row['first_name2'] = first_name1

答案 1 :(得分:0)

您的问题对我来说不是很清楚,但是从我得到的结果中,您尝试做这样的事情:

import pandas as pd
import numpy as np

header = ["First_name1", "First_name2", "First_name3", "Last_name1", "Last_name2"]
df= pd.DataFrame([["Michel", "Michelle", "Michele", "Jeremi", "Jeremy"],
                         ["Jack", "Jack", "Jak", "Jean", "Jean"],
                         ["Dave", "Dav", "Dave", "Daniel", "Danielle"]], columns=header)

print df

# Create empty df
finalDataFrame = pd.DataFrame(columns=header)

for index, row in df.iterrows():
    firstName = row[0]
    # convert to row as tuple cannot be modified
    lrow = list(row)
    if (firstName == row[1]):
        lrow[1] = np.NaN
    if (firstName == row[2]):
        lrow[2] = np.NaN
    # Append the row to the final DataFrame
    finalDataFrame.loc[len(finalDataFrame)] = lrow

print finalDataFrame

希望有帮助!