我有2个数据帧df1和df2。 df2数据帧是我提取的df1的子集以进行一些清理。两个数据框都可以在索引上匹配。我在网站上看到很多合并。我不想向df1中添加更多列,并且数据帧的大小不同df1有1000行,df2有275行,所以我不想替换整个列。我想用df2 ['AgeBin']值更新df1 ['AgeBin'],这些数据帧的索引匹配。
indexes = df.loc[df.AgeBin.isin(dfage_test.AgeBin.values)].index
df1.at[indexes,'AgeBin'] = df2['AgeBin'].values
这是我想出的,但由于df的大小不同,似乎存在问题
ValueError: Must have equal len keys and value when setting with an iterable
下面是一个过分的简化。 df1有26列,df2有12列,Agebin是两个dfs中的最后一列。从理论上讲,这就是我的目标
df2
AgeBin
0 2
1 3
2 1
3 3
df1
AgeBin
0 NaN
1 NaN
2 NaN
3 NaN
df1 after update
AgeBin
0 2
1 3
2 1
3 3
以下是数据框规格
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 1046 non-null float64
1 Survived 714 non-null category
2 Pclass 1046 non-null category
3 Name 1046 non-null object
4 Sex 1046 non-null object
5 Age 1046 non-null float64
6 SibSp 1046 non-null float64
7 Parch 1046 non-null float64
8 Ticket 1046 non-null object
9 Fare 1046 non-null float64
10 Embarked 1046 non-null category
11 FamilySize 1046 non-null float64
12 Surname 1046 non-null object
13 Title 1046 non-null object
14 IsChild 1046 non-null float64
15 isMale 1046 non-null category
16 GroupID 1046 non-null float64
17 GroupSize 1046 non-null float64
18 GroupType 1046 non-null object
19 GroupNumSurvived 1046 non-null float64
20 GroupNumPerished 1046 non-null float64
21 LargeGroup 1046 non-null float64
22 SplitFare 1046 non-null float64
23 log10Fare 1046 non-null float64
24 log10SplitFare 1046 non-null float64
25 AgeBin 1046 non-null category
dtypes: category(5), float64(15), object(6)
memory usage: 221.9+ KB
dfageResults.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 263 entries, 5 to 1308
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 AgeBin 263 non-null category
dtypes: category(1)
memory usage: 12.4 KB
这是类别
67] dfageResults.groupby(["AgeBin"])["AgeBin"].count()
AgeBin
0-14 25
15-29 192
30-44 46
Name: AgeBin, dtype: int64
[68] df.groupby(["AgeBin"])["AgeBin"].count()
AgeBin
0-14 107
15-29 462
30-44 301
45-59 136
60+ 40
Name: AgeBin, dtype: int64
答案 0 :(得分:1)
假设df2
中的所有索引都存在于df1
中(据我所知)-以下内容就足够了:
df1.loc[df2.index,:]=df2
如果上述index
的假设不成立-这是替代方案(相同的结果-仅更新df1
中的现有索引):
df1.loc[set(df2.index).intersection(set(df1.index)),:]=df2
样品输出(具有更多代表性样品数据):
import pandas as pd
import numpy as np
df1=pd.DataFrame({"AgeBin": [1,2,3,'x', np.nan,np.nan,'a']})
df2=pd.DataFrame({"AgeBin": ['new1', 'new2', 123]}, index=[5,2,3])
print(df1)
print(df2)
df1.loc[df2.index,:]=df2
print(df1)
输出:
AgeBin
0 1
1 2
2 3
3 x
4 NaN
5 NaN
6 a
AgeBin
5 new1
2 new2
3 123
AgeBin
0 1
1 2
2 new2
3 123
4 NaN
5 new1
6 a
答案 1 :(得分:0)
尝试:
print('df2')
print(df2)
print('\ndf1')
print(df1)
df1.update(df2)
print('\ndf1 after update')
print(df1)
输出:
df2
AgeBin
0 2
1 3
2 1
3 3
df1
AgeBin
0 NaN
1 NaN
2 NaN
3 NaN
df1 after update
AgeBin
0 2
1 3
2 1
3 3