Using Pandas Joins to count records that meet conditions

时间:2015-09-14 15:51:30

标签: python python-2.7 pandas

I wanted to figure out how many rows in my project_timeline dataframe occur in May and are of Project Type 1. The Month and Project Type are contained in another dataframe called project_cost. The two tables are related via the index on project_cost dataframe and the Project column on the project_timeline dataframe. I am trying to do this calculation by first joining the tables with pandas.merge then a sum function. I expect to see a result of 2, but get 0.

import pandas as pd
project_cost = pd.DataFrame(data = [['Type 1', 'May', 3000], ['Type 3', 'April', 2000], ['Type 2', 'April', 1000]], columns=['Project Type', 'Month', 'Cost'], index=['Project 1', 'Project 2', 'Project 3'])
project_timeline = pd.DataFrame(data = [['Project 1', 30], ['Project 2', 30], ['Project 1', 20]], columns=['Project', 'Days'])

merged_pds =  pd.merge(project_cost, project_timeline, left_index=True, right_on='Project', how='right')
print merged_pds
print sum(['May' in i for i in merged_pds[merged_pds['Project Type']=='Type 1']['Project Type'].tolist()])

1 个答案:

答案 0 :(得分:1)

You get 0 because you have no 'Project Type's that are 'Type 2':

In [77]:
merged_pds[merged_pds['Project Type']=='Type 2']

Out[77]:
Empty DataFrame
Columns: [Project Type, Month, Cost, Project, Days]
Index: []

your question asked for 'Type 1'

Also you're testing membership of 'May' in 'Project Type' of which there are none so even if you fixed the above it will still fail:

In [79]:
['May' in i for i in merged_pds[merged_pds['Project Type']=='Type 1']['Project Type'].tolist()]

Out[79]:
[False, False]

you wanted this:

In [75]:
print(sum(['May' in i for i in merged_pds[merged_pds['Project Type']=='Type 1']['Month'].tolist()]))
2

However, iterating is not required here, you could just do this:

In [76]:
merged_pds.loc[(merged_pds['Project Type'] == 'Type 1') & (merged_pds['Month'] == 'May'), 'Month'].count()

Out[76]:
2