我有一个这样的DataFrame:
Id First_name1 first_name2. first_name3 last_name1 last_name2
1. Michel. michelle. Michele. Jeremi. Jeremy
2 Jack. jack. Jak. Jean. Jean
3. Dave. Dav. Dave Daniel. Danielle
如您所见,对于相同的ID,名称写的不一样。我想检查每一行是first_name1
== first_name2
还是first_name3
。如果相等,则创建一个名为first_name
的新列,否则将所有不同的名称设置为first_name1
,依此类推……
Id. First_name. First_name1. First_name2. Last_name1. Last_name2
1. Michel. Michelle. Michele. Jeremy. Jeremi
2. Jack. Jak. nan. Jean. nan
3. Dave. Dav. nan. Daniel. Danielle
答案 0 :(得分:0)
首先,您遍历数据框的行:
for index, row in yourdf.iterrows():
然后为数据框中的每一行比较两个要比较的值:
if row['First_name1'] == row['first_name2']:
# Create the new column and set its value to first_name
row['new_column'] = first_name
else:
# Set each column to the value you want
row['first_name'] = first_name1
row['first_name2'] = first_name1
答案 1 :(得分:0)
您的问题对我来说不是很清楚,但是从我得到的结果中,您尝试做这样的事情:
import pandas as pd
import numpy as np
header = ["First_name1", "First_name2", "First_name3", "Last_name1", "Last_name2"]
df= pd.DataFrame([["Michel", "Michelle", "Michele", "Jeremi", "Jeremy"],
["Jack", "Jack", "Jak", "Jean", "Jean"],
["Dave", "Dav", "Dave", "Daniel", "Danielle"]], columns=header)
print df
# Create empty df
finalDataFrame = pd.DataFrame(columns=header)
for index, row in df.iterrows():
firstName = row[0]
# convert to row as tuple cannot be modified
lrow = list(row)
if (firstName == row[1]):
lrow[1] = np.NaN
if (firstName == row[2]):
lrow[2] = np.NaN
# Append the row to the final DataFrame
finalDataFrame.loc[len(finalDataFrame)] = lrow
print finalDataFrame
希望有帮助!