根据前三列透视列和合并

时间:2016-05-06 16:19:00

标签: numpy pandas arcpy

我的摘要统计结果的结果包含以下列:

  1. TOWN
  2. SETTLEMENTNAME
  3. NAME
  4. TIME
  5. FREQUENCY
  6. 我使用Python Pandas和NumPy编写了以下代码:

    '''
    Created on April 6, 2016
    
    Summarise Number of Buildings
    
    per Time Interval
    
    (5, 10, 15, 25, 30, 60)
    
    @author: PeterW
    '''
    # import site-packages and modules
    from pathlib import Path
    import numpy.lib.recfunctions as rfn
    import pandas as pd  # Pandas version 0.13.0
    import arcpy
    
    # set arguments
    saa_stats_table = r"E:\Projects\2016\G112224\Models\Schools\Schools_Combined_160505.gdb\Botrivier_Prim_SAA_Stats"
    
    # environment settings
    arcpy.env.overwriteOutput = True
    fgdb = Path(saa_stats_table).parents[0]
    
    
    def pivot_table(saa_stats_table, fgdb):
        fields = [f.name for f in arcpy.ListFields(saa_stats_table)]
        table_recarray = arcpy.da.TableToNumPyArray(saa_stats_table, fields)  # @UndefinedVariable
        print table_recarray
        df = pd.DataFrame(table_recarray[fields])
        pivot = df.pivot(index="OBJECTID",
                         columns="TIME",
                         values="FREQUENCY").fillna(0, downcast="infer")
        pivot_fields = pivot.columns.values
        # rename pivot fields with prefix "TIME"
        pivot.columns = [("{0}{1}".format("TIME", field)) for field in pivot_fields]
        # convert pandas dataframe to record array
        pivot_recarray = pivot.to_records(index=False)
        pivot_type = pivot_recarray.dtype.descr
        pivot_type_new = [(x[0], x[1].replace(x[1], "<i2")) for x in pivot_type]
        # change pivot record array data type to short integer
        pivot_recarray = pivot_recarray.astype(pivot_type_new)
        fields2 = ["TOWN", "SETTLEMENTNAME", "NAME"]
        table_type_new = [(str(x), "<U25") for x in fields2]
        # change table array data type to unicode 50 characters
        table_recarray = table_recarray[fields2].astype(table_type_new)
        recarray_list = [table_recarray, pivot_recarray]
        # merge table and pivot record array
        summary_array = rfn.merge_arrays(recarray_list, flatten=True, usemask=False)
        summary_table = str(Path(fgdb, "SAA_Stats_Test"))
        # convert merged record array to file geodatabase table
        if arcpy.Exists(summary_table):
            arcpy.Delete_management(summary_table)
            arcpy.da.NumPyArrayToTable(summary_array, summary_table)  # @UndefinedVariable
        else:
            arcpy.da.NumPyArrayToTable(summary_array, summary_table)  # @UndefinedVariable
    
    pivot_table(saa_stats_table, fgdb)
    

    我得到的结果是:

    enter image description here

    我正在寻找的结果是前三列是案例字段的位置,其余的是将TIME5 - TIME 60设置为新列的位置:

    enter image description here

    我不确定如何折叠前三个字段:&#34; TOWN&#34;,&#34; SETTLEMENTNAME&#34;,&#34; NAME&#34;并拥有&#34; TIME&#34;设置的字段。任何建议将不胜感激。

1 个答案:

答案 0 :(得分:0)

您可以根据需要使用.stack().unstack()重塑DataFrame

df开始:

        TOWN  SETTLEMENT            NAME  TIME5  TIME10  TIME15  TIME20  \
0  Botrivier  New France  Botrivier Prim      0       0       0       0   
1  Botrivier  New France  Botrivier Prim      0       0       0     100   

   TIME25  TIME30  TIME60  
0     200       0       0  
1       0       0       0 

您可以使用.stack()

df = df.set_index(['TOWN', 'SETTLEMENT', 'NAME']).stack()

产生:

TOWN       SETTLEMENT  NAME                  
Botrivier  New France  Botrivier Prim  TIME5       0
                                       TIME10      0
                                       TIME15      0
                                       TIME20      0
                                       TIME25    200
                                       TIME30      0
                                       TIME60      0
                                       TIME5       0
                                       TIME10      0
                                       TIME15      0
                                       TIME20    100
                                       TIME25      0
                                       TIME30      0
                                       TIME60      0

此时,您需要决定如何处理每个案例的多个0值,因为.unstack()不适用于重复的索引值。

一种简单的方法是删除0值,并在必要时添加TIME columns 0值。

df[df!=0].unstack().reset_index()

然后产生:

        TOWN  SETTLEMENT            NAME  TIME20  TIME25
0  Botrivier  New France  Botrivier Prim   100.0   200.0

希望这有帮助。