我有2个数据框,它们都有多个共享相同名称的列,并且都有一个充当唯一标识符的列。我创建了一个数据框的副本,但是我想做的是将唯一标识符相等的共享相同名称的列相乘。
我对python来说还很陌生,我肯定很多人都认为这很简单,但是仔细阅读文档会发现这很困难。
最初的2个数据帧是通过将excel文件读入熊猫来创建的。
然后我创建第一个数据帧的副本,并想将第二个数据帧的内容乘以复制的数据帧的内容,在该副本数据中找到唯一行且列名匹配。
import pandas as pd
# read tables from excel to create dictionary of dataframes where the key is the tab name
all_sheets_df = pd.read_excel("xl_file_name.xlsx", sheet_name=None)
print(all_sheets_df)
# These are the dataframes created from the excel sheets
OrderedDict([('sheet1',
cola colb colc uni-id 201801 201802 201803 201804 201805
0 strings strings strings unique-a 4 3 2 9 10
1 strings strings strings unique-b 8 1 9 1 6
2 strings strings strings unique-c 4 3 4 4 3
3 strings strings strings unique-d 3 9 8 4 4
4 strings strings strings unique-e 5 4 7 9 10
5 strings strings strings unique-f 2 3 8 2 1
6 strings strings strings unique-g 2 4 2 6 8
7 strings strings strings unique-h 6 2 5 4 10
8 strings strings strings unique-i 7 1 3 10 8),
('sheet2',
cola colb colc uni-id 201801 201802 201803 \
0 strings strings strings unique-d 0.052935 0.928645 0.505045
1 strings strings strings unique-f 0.776922 0.338918 0.932535
2 strings strings strings unique-c 0.799160 0.343798 0.145575
3 strings strings strings unique-a 0.659975 0.308475 0.588496
4 strings strings strings unique-i 0.450931 0.667722 0.831734
5 strings strings strings unique-e 0.791060 0.801188 0.781400
6 strings strings strings unique-b 0.653861 0.649786 0.545784
7 strings strings strings unique-h 0.849901 0.327025 0.874650
8 strings strings strings unique-g 0.812554 0.995710 0.042272
201804 201805
0 0.011463 0.980985
1 0.743247 0.715230
2 0.313438 0.882728
3 0.656984 0.864108
4 0.236997 0.422303
5 0.603261 0.083762
6 0.722503 0.170563
7 0.608704 0.263881
8 0.702862 0.760257 )])
# create new dataframe as a copy of the first sheet in excel
calculated_dataframe = all_sheets_df.get("sheet1","").copy()
# get list of columns to update (all columns to be updated start with characters '20'
update_cols = [col for col in calculated_dataframe.columns if '20' in col]
# for each row in calculated_dataframe, find row in all_sheets_df.get("sheet2","") where column name 'uni-id' matches 'uni-id' in calculated_dataframe and then for each update_col in update_cols calculate_dataframe value = calculate_dataframe value * all_sheets_df.get("sheet2","") value
# this is the piece im really struggling with.
我想遍历计算所得数据帧的每一行,在all_sheets_df.get("sheet2","")
中找到相应的行,其中sheet2列uni-id =计算结果数据列uni-id并将在update_cols列表中找到的每一列相乘(这些列存在在两个数据框中)。
您可以提供的任何指导都是很棒的!
答案 0 :(得分:0)
您应该首先在标识符列上使用merge函数,以使所有列中只有一个df。
testImplementation 'org.json:json:20140107'
然后,具有相同标识符的行将被分组在一起,而您只需要像往常一样将列相乘即可:
df1 = df1.merge(df2, how ='left', on=['identifier_column'])