pd_one
name Rep
0 Bob-Name 29
1 Bill-Name 23
2 Sam-Name 46
3 Jeff-Name 17
4 Red-Name 20
pd_two
name sumT total
0 Bob-0021 10 = (pd_one.bob-Name.value)-1/sumT*100
1 Bill-0220 20 = (pd_one.Bill-Name.value)-1/sumT*100
2 Sam-0020 23 = (pd_one.Sam-Name.value)-1/sumT*100
3 Jeff-0125 25 = (pd_one.Jeff-Name.value)-1/sumT*100
4 Red-1234 99 = (pd_one.Red-Name.value)-1/sumT*100
我想使用pd_one中的值作为总和的一部分在pd_two中创建合计列。该值应取自pd_one,其中列名与pd_two相同。
据我所知:
self.pd_one = pd.DataFrame(pd_one_data)
self.pd_two = pd.DataFrame(pd_two_data)
self.pd_one.sort_values(by=['name'], ascending = True, inplace = True)
self.pd_two.sort_values(by=['name'], ascending = True, inplace = True)
name_ftr = self.pd_two.name[:].str.partition('-')[0]+'-Name'
在这里,我只需要在name_ftr的pd_one中找到系列,并获取Rep单元格的值 并使用它进行计算,并在pd_two中填充并填充新列。
我正在努力确定执行此操作的语法,我一直在尝试使用.loc,但似乎无法从pd_one ['Rep']获取数据而不会遇到错误。
谢谢
答案 0 :(得分:0)
尝试一下。
pd_one = pd.DataFrame(
[['Bob-Name', 29],
['Bill-Name',23],
['Sam-Name',46],
['Jeff-Name',17],
['Red-Name',20]], columns = ['name', 'Rep'])
pd_two = pd.DataFrame([['Bob-0021', 10],
['Bill-0220', 20],
['Sam-0020', 23],
['Jeff-0125', 25],
['Red-1234', 99]], columns = ['name', 'sumT'])
pd_one['name_unique'] = pd_one['name'].apply(lambda x:x.split('-')[0])
pd_two['name_unique'] = pd_two['name'].apply(lambda x:x.split('-')[0])
merged_df = pd.merge(pd_one, pd_two, on = 'name_unique')
merged_df['total'] = (merged_df['Rep'] - 1) / merged_df['sumT'] * 100
merged_df
输出为
name_x Rep name_unique name_y sumT total
0 Bob-Name 29 Bob Bob-0021 10 280.000000
1 Bill-Name 23 Bill Bill-0220 20 110.000000
2 Sam-Name 46 Sam Sam-0020 23 195.652174
3 Jeff-Name 17 Jeff Jeff-0125 25 64.000000
4 Red-Name 20 Red Red-1234 99 19.191919
在这里,name_x是pd_one中的名称,name_y是pd_two中的名称。