我有以下数据框:
DF1
+-----+------+------+------+------+------+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+------+
| 123 | 2 | 5 | 2 | MN | ... |
| 453 | 4 | 3 | 1 | MN | ... |
| 146 | 7 | 9 | 4 | AA | ... |
| 175 | 2 | 4 | 3 | MN | ... |
| 643 | 0 | 0 | 0 | NAN | ... |
+-----+------+------+------+------+------+
和df2
+-----+------+------+------+------+
| No. | col1 | col2 | col3 | Type |
+-----+------+------+------+------+
| 123 | 24 | 57 | 22 | MN |
| 453 | 41 | 39 | 15 | MN |
| 175 | 21 | 43 | 37 | MN |
+-----+------+------+------+------+
我想要做的是,如果col1
等于col2
,则将df1中的col3
,Type
和MN
的值替换为相应的df2值
所以期望的输出是:
DF1
+-----+------+------+------+------+-----+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+-----+
| 123 | 24 | 57 | 22 | MN | ... |
| 453 | 41 | 39 | 15 | MN | ... |
| 146 | 7 | 9 | 4 | AA | ... |
| 175 | 21 | 43 | 37 | MN | ... |
| 643 | 0 | 0 | 0 | NAN | ... |
+-----+------+------+------+------+-----+
修改
我试过了:
df1[df1.Type == 'MN'] = df2.values
但是我收到了这个错误:
ValueError: Must have equal len keys and value when setting with an ndarray
猜猜原因是,df2没有相同数量的列。那么我如何确保在df1中仅替换特定列(col1-col3)?
答案 0 :(得分:1)
我认为需要combine_first
来匹配No.
列:
#filter only `MN` rows if necessary
df22 = df2[df2['Type'] == 'MN'].set_index('No.')
df1 = df22.combine_first(df1.set_index('No.')).reset_index().reindex(columns=df1.columns)
print (df1)
No. col1 col2 col3 Type col
0 123 24.0 57.0 22.0 MN ...
1 146 7.0 9.0 4.0 AA ...
2 175 21.0 43.0 37.0 MN ...
3 453 41.0 39.0 15.0 MN ...
4 643 0.0 0.0 0.0 NAN ...
答案 1 :(得分:0)
您的代码无效,因为df1
和df2
的列数不同。
from io import StringIO
import pandas as pd
x1 = """No.,col1,col2,col3,Type,Oth
123,2,5,2,MN,...
453,4,3,1,MN,...
146,7,9,4,AA,...
175,2,4,3,MN,...
643,0,0,0,NAN,...
"""
x2 = """No.,col1,col2,col3,Type
123,24,57,22,MN
453,41,39,15,MN
175,21,43,37,MN
"""
df1 = pd.read_csv(StringIO(x1), sep=",")
df2 = pd.read_csv(StringIO(x2), sep=",")
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df2.values
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 24 57 22 MN ...
# 1 453 41 39 15 MN ...
# 2 146 7 9 4 AA ...
# 3 175 21 43 37 MN ...
# 4 643 0 0 0 NAN ...
但如果df1
和df2
的列顺序不同,则会出现问题。
df1 = pd.read_csv(StringIO(x1), sep=",")
df3 = df2.copy()[["No.","Type","col1","col2","col3"]]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df3.values
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 MN 24 57 22 ...
# 1 453 MN 41 39 15 ...
# 2 146 7 9 4 AA ...
# 3 175 MN 21 43 37 ...
# 4 643 0 0 0 NAN ...
为避免这种情况,您可以尝试
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
df3[["No.","col1","col2","col3","Type"]].values)
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 24 57 22 MN ...
# 1 453 41 39 15 MN ...
# 2 146 7 9 4 AA ...
# 3 175 21 43 37 MN ...
# 4 643 0 0 0 NAN ...
然而,如果' MN'的数量仍然存在问题。 df1
和df2
df1 = pd.read_csv(StringIO(x1), sep=",")
df4 = df2.copy().iloc[:2]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
df4[["No.","col1","col2","col3","Type"]].values)
# Error:
# ValueError: shape mismatch: value array of shape (2,) could not be broadcast to
# indexing result of shape (3,)
所以你需要的可能是这样的
df = pd.merge(df1, df2, how='left', on=['No.', 'Type'])
df['col1'] = df.apply(lambda x: x.col1_y if x.Type == 'MN' else x.col1_x, axis=1)
df['col2'] = df.apply(lambda x: x.col2_y if x.Type == 'MN' else x.col2_x, axis=1)
df['col3'] = df.apply(lambda x: x.col3_y if x.Type == 'MN' else x.col3_x, axis=1)
df = df[["No.","col1","col2","col3","Type"]]
# Output:
#>>> print(df)
# No. col1 col2 col3 Type
#0 123 24.0 57.0 22.0 MN
#1 453 41.0 39.0 15.0 MN
#2 146 7.0 9.0 4.0 AA
#3 175 21.0 43.0 37.0 MN
#4 643 0.0 0.0 0.0 NAN