使用pandas进行非常具体的任务

时间:2016-04-25 21:06:39

标签: python pandas grouping frequency-analysis

我有这个数据集代表有关火星陨石坑的信息。我只对纬度,经度和可见图层列数感兴趣。我试图将纬度分为10度组:-90到-80,-80到-70等,同时获取图层列值(0,1,2,3,4,5)并将它们转换为列本身获取每个10度组的每个图层列值的value_counts表。

What I want

What I have

把头发拉出来,我似乎尝试了一切我能理解的东西。

2 个答案:

答案 0 :(得分:3)

这对你有用吗?

import pandas as pd
import random

# generate random data
N = 100
longitudes = [random.randint(-20, 89) for _ in xrange(N)]
layers = [random.randint(0, 5) for _ in xrange(N)]
data = pd.DataFrame({'LONGITUDE_CIRCLE_IMAGE': longitudes, 'NUMBER_LAYERS': layers})

def get_degree_group(longitude, mn=-20, mx=90, delta_deg=10):
    """calculate the layer from the given longitude"""
    return (longitude - mn) / delta_deg

def make_table(df):
    # make a new column by calculating the degree group from longitude column
    df['degree_group'] = df.LONGITUDE_CIRCLE_IMAGE.apply(get_degree_group)
    # count the number of craters with properties (deg_grp, num_lyr)
    s = df.groupby(['degree_group', 'NUMBER_LAYERS']).size()
    # s is a pandas Series where the index is in the form: (deg_grp, num_lyr)
    # and the values are the counts of crates in that category
    #
    # We want to convert the series into a table where num_lyr values are columns
    # This task is done with unstack method
    table = s.unstack('NUMBER_LAYERS')
    # there are some (deg_grp, num_lyr) cases for which there are no existing craters
    # Pandas put NaN for those cases. It might be better to put 0 into those cells
    table.fillna(0, inplace = True)
    return table

make_table(data)

答案 1 :(得分:1)

使用pd.cut制作小组,pivot_table计算。

数据样本:

lat=rand(3000)*180-90
layers=randint(0,6,3000)
data=pd.DataFrame({'lat':lat,'layers':layers})

组成18个小组:

data['groups'] = pd.cut(lat,linspace(-90,90,19))

还有一张桌子:

data.pivot_table(index='groups',columns='layers',aggfunc='count',fill_value=0)

           lat               
layers       0  1  2  3  4  5
groups                       
(-90, -80]   4  1  2  1  1  0
(-80, -70]   1  0  0  2  2  3
(-70, -60]   4  3  2  4  3  4
(-60, -50]   6  2  1  1  2  3
(-50, -40]   2  3  4  2  2  4
(-40, -30]   4  3  4  2  4  4
(-30, -20]   2  5  2  2  3  2
(-20, -10]   4  2  6  3  5  2
(-10, 0]     3  4  2  3  2  1
(0, 10]      5  3  4  3  4  7
(10, 20]     3  3  2  2  2  3
(20, 30]     2  1  1  4  3  5
(30, 40]     1  2  0  2  2  3
(40, 50]     1  3  3  2  3  4
(50, 60]     6  0  2  4  1  6
(60, 70]     3  3  2  5  1  5
(70, 80]     1  4  5  3  2  2
(80, 90]     1  7  3  2  4  2