Store grouped data with variable

时间:2016-12-02 05:01:20

标签: python pandas for-loop grouping

I have a general question about pandas. I have a DataFrame named d with a lot of info on parks. All unique park names are stored in an array called parks. There's another column with a location ID and I want to iterate through the parks array and print unique location ID counts associated with that park name.

d[d['Park']=='AKRO']
len(d['Location'].unique()) 

gives me a count of 24824.

x = d[d['Park']=='AKRO']
print(len(x['Location'].unique()))

gives me a location count of 1. Why? I thought these are the same except I am storing the info in a variable.

So naturally the loop I was trying doesn't work. Does anyone have any tips?

counts=[]
for p in parks:
    x= d[d['Park']==p]
    y= (len(x['Location'].unique()))
    counts.append([p,y])

3 个答案:

答案 0 :(得分:1)

When you subset the first time, you're not assigning d[d['Park'] == 'ARKO'] to anything. So you haven't actually changed the data. You only viewed that section of the data.

When you assign x = d[d['Park']=='AKRO'], x is now only that section that you viewed with the first command. That's why you get the difference you are observing.

Your for loop is actually only looping through the columns of d. If you wish to loop through the rows, you can use the following.

for idx, row in d.iterrows():
    print(idx, row)

However, if you want to count the number of locations with a for loop, you have to loop through each park. Something like the following.

for park in d['Park'].unique():
    print(park, d.loc[d['Park'] == park, 'Location'].size())

You can accomplish your goal without iteration, however. This sort of approach is preferred.

d.groupby('Park')['Location'].nunique()

答案 1 :(得分:1)

You can try something like,

d.groupby('Park')['Location'].nunique()

答案 2 :(得分:1)

Be careful with Panda's DataFrame functions for which produce an inline change or not. For example, d[d['Park']=='AKRO'] doesn't actually change the DataFrame d. However, x = d[d['Park']=='AKRO'] sets the output of d[d['Park']=='AKRO'] to x so x now only has 1 Location.

Have you manually checked how many unique Location IDs exist for 'AKRO'? The for loop looks correct outside of the extra brackets around y= len(x['Location'].unique())