我正在尝试将两个不同长度的pandas数据帧添加到一起:
fruit rating count
0 apple A 2
1 pear B 2
2 peach A 1
3 apple B 2
4 pear C 1
fruit rating count
0 apple A 0
1 apple B 0
2 apple C 0
3 pear A 0
4 pear B 0
5 pear C 0
6 peach A 0
7 peach B 0
8 peach C 0
基本上我想添加第一个数据帧'将整数计数到它下面的第二个,其中类型和等级相同。例如,dataframe1上的索引1应该在dataframe2的索引4上添加2 2个计数,因为" pear和B"。
我已经尝试了更新功能,但它似乎搞乱了索引,只是替换了类型和评级。如果我不擅长解释,请道歉。我仍在学习。非常感谢你的帮助。
答案 0 :(得分:2)
您可以在相关索引上尝试groupby
:
temp1=first_df.set_index(['fruit','rating'])
temp2=second_df.set_index(['fruit','rating'])
result = temp1.add(temp2,fill_value=0)
这为您提供了多索引DataFrame
:
count
fruit rating
apple A 2.0
B 2.0
C 0.0
peach A 1.0
B 0.0
C 0.0
pear A 0.0
B 2.0
C 1.0
如果要删除索引,只需重置索引:
result.reset_index()
Out[182]:
fruit rating count
0 apple A 2.0
1 apple B 2.0
2 apple C 0.0
3 peach A 1.0
4 peach B 0.0
5 peach C 0.0
6 pear A 0.0
7 pear B 2.0
8 pear C 1.0
答案 1 :(得分:1)
我发现SQL最直观的用于此目的:
import pandasql
import pandas as pd
pysqldf = lambda q: pandasql.sqldf(q, globals())
Table1 = pd.DataFrame()
Table1['x'] = [x for x in range(10)]
Table2 = pd.DataFrame()
Table2['x'] = [x for x in range(10)]
print pysqldf('''
SELECT
*,
1 as ID
FROM Table1
UNION
SELECT *,2 as ID
FROM Table2
''')
答案 2 :(得分:0)
假设您的数据帧分别为df1和df2,
df3 = pd.merge(df2, df1, how = 'outer', on = ['fruit', 'rating'])
df3 = df3.drop('count_x', axis = 1).fillna(0)
df3.columns = ['fruit', 'rating', 'count']
将为您提供所需的数据框
fruit rating count
0 apple A 2.0
1 apple B 2.0
2 apple C 0.0
3 pear A 0.0
4 pear B 2.0
5 pear C 1.0
6 peach A 1.0
7 peach B 0.0
8 peach C 0.0
答案 3 :(得分:0)
重点关注仅更新需要添加的行并保留integer
dtype
df1.append(df2).groupby(['fruit', 'rating']).sum().reset_index()
fruit rating count
0 apple A 2
1 apple B 2
2 apple C 0
3 peach A 1
4 peach B 0
5 peach C 0
6 pear A 0
7 pear B 2
8 pear C 1