我的摘要统计结果的结果包含以下列:
我使用Python Pandas和NumPy编写了以下代码:
'''
Created on April 6, 2016
Summarise Number of Buildings
per Time Interval
(5, 10, 15, 25, 30, 60)
@author: PeterW
'''
# import site-packages and modules
from pathlib import Path
import numpy.lib.recfunctions as rfn
import pandas as pd # Pandas version 0.13.0
import arcpy
# set arguments
saa_stats_table = r"E:\Projects\2016\G112224\Models\Schools\Schools_Combined_160505.gdb\Botrivier_Prim_SAA_Stats"
# environment settings
arcpy.env.overwriteOutput = True
fgdb = Path(saa_stats_table).parents[0]
def pivot_table(saa_stats_table, fgdb):
fields = [f.name for f in arcpy.ListFields(saa_stats_table)]
table_recarray = arcpy.da.TableToNumPyArray(saa_stats_table, fields) # @UndefinedVariable
print table_recarray
df = pd.DataFrame(table_recarray[fields])
pivot = df.pivot(index="OBJECTID",
columns="TIME",
values="FREQUENCY").fillna(0, downcast="infer")
pivot_fields = pivot.columns.values
# rename pivot fields with prefix "TIME"
pivot.columns = [("{0}{1}".format("TIME", field)) for field in pivot_fields]
# convert pandas dataframe to record array
pivot_recarray = pivot.to_records(index=False)
pivot_type = pivot_recarray.dtype.descr
pivot_type_new = [(x[0], x[1].replace(x[1], "<i2")) for x in pivot_type]
# change pivot record array data type to short integer
pivot_recarray = pivot_recarray.astype(pivot_type_new)
fields2 = ["TOWN", "SETTLEMENTNAME", "NAME"]
table_type_new = [(str(x), "<U25") for x in fields2]
# change table array data type to unicode 50 characters
table_recarray = table_recarray[fields2].astype(table_type_new)
recarray_list = [table_recarray, pivot_recarray]
# merge table and pivot record array
summary_array = rfn.merge_arrays(recarray_list, flatten=True, usemask=False)
summary_table = str(Path(fgdb, "SAA_Stats_Test"))
# convert merged record array to file geodatabase table
if arcpy.Exists(summary_table):
arcpy.Delete_management(summary_table)
arcpy.da.NumPyArrayToTable(summary_array, summary_table) # @UndefinedVariable
else:
arcpy.da.NumPyArrayToTable(summary_array, summary_table) # @UndefinedVariable
pivot_table(saa_stats_table, fgdb)
我得到的结果是:
我正在寻找的结果是前三列是案例字段的位置,其余的是将TIME5 - TIME 60设置为新列的位置:
我不确定如何折叠前三个字段:&#34; TOWN&#34;,&#34; SETTLEMENTNAME&#34;,&#34; NAME&#34;并拥有&#34; TIME&#34;设置的字段。任何建议将不胜感激。
答案 0 :(得分:0)
您可以根据需要使用.stack()
和.unstack()
重塑DataFrame
。
从df
开始:
TOWN SETTLEMENT NAME TIME5 TIME10 TIME15 TIME20 \
0 Botrivier New France Botrivier Prim 0 0 0 0
1 Botrivier New France Botrivier Prim 0 0 0 100
TIME25 TIME30 TIME60
0 200 0 0
1 0 0 0
您可以使用.stack()
df = df.set_index(['TOWN', 'SETTLEMENT', 'NAME']).stack()
产生:
TOWN SETTLEMENT NAME
Botrivier New France Botrivier Prim TIME5 0
TIME10 0
TIME15 0
TIME20 0
TIME25 200
TIME30 0
TIME60 0
TIME5 0
TIME10 0
TIME15 0
TIME20 100
TIME25 0
TIME30 0
TIME60 0
此时,您需要决定如何处理每个案例的多个0
值,因为.unstack()
不适用于重复的索引值。
一种简单的方法是删除0
值,并在必要时添加TIME
columns
0
值。
df[df!=0].unstack().reset_index()
然后产生:
TOWN SETTLEMENT NAME TIME20 TIME25
0 Botrivier New France Botrivier Prim 100.0 200.0
希望这有帮助。