Question

我有一个ESRI Point Shape文件，其中包含（以及其他）nMSLINK字段和DIAMETER字段。由于空间连接，MSLINK不是唯一的。我想要实现的是只保留shapefile中具有唯一MSLINK和最小DIAMETER值的功能，以及其他字段中的相应值。我可以使用搜索光标来实现这一点（循环遍历所有功能并删除不符合的每个功能，但这需要很长时间（> 75000个功能）。我想知道例如.numpy是否可以在ArcMap / arcpy中更快地完成这个技巧。

Answer 1

我认为，如果您使用内存而不是与arcgis交互，那么进行这种处理肯定会快得多。例如，通过将所有行放在python对象中（这可能是一个很好的选择）。然后，您可以找出要删除或插入的行。

最快的方法取决于a）如果你有很多（MSLINK）重复行，那么最快的方法就是在新层中插入你需要的那些。或者b）如果要删除的行与行总数相比只是几行，则删除速度更快。

对于a）你需要将所有字段提取到元组中，包括点坐标，这样你就可以创建一个新的要素类并插入新的行。

# Example of Variant a:

from collections import namedtuple

# assuming the following:
source_fc # contains name of the fclass
the_path # contains path to the shape
cleaned_fc # the name of the cleaned fclass


# use all fields of source_fc plus the shape token to get a touple with xy
# coordinates (using 'mslink' and 'diam' here to simplify the example)
fields = ['mslink', 'diam', 'field3', ... ]
all_fields = fields + ['SHAPE@XY']

# define a namedtuple to hold and work with the rows, use the name 'point' to
# hold the coordinates-tuple
Row = namedtuple('Row', fields + ['point'])
data = []
with arcpy.da.SearchCursor(source_fc, fields) as sc:
    for r in sc:
        # unzip the values from each row into a new Row (namedtuple) and append
        # to data
        data.append(Row(*r))

# now just delete the rows we don't want, for this, the easiest way, is probably
# to order the tuple first after MSLINK and then after the diamater...
data = sorted(data, key = lambda x : (x.mslink, x.diam))

# ... now just keep the first ones for each mslink
to_keep = []
last_mslink = None
for d in data:
    if last_mslink != d.mslink:
        last_mslink = d.mslink
        to_keep.append(d)

# create a new feature class with the same fields as the source_fc
arcpy.CreateFeatureclass_management(
        out_path=the_path, out_name=cleaned_fc, template=source_fc)
with arcpy.da.InsertCursor(cleaned_fc, all_fields) as ic:
    for r in to_keep:
        ic.insertRow(*r)

对于替代方案b）我只需要获取3个字段，一个唯一ID，MSLINK和直径。然后创建一个删除列表（这里只需要唯一的ID）。然后再次遍历要素类并删除删除列表中具有id的行。只是为了确定，我会首先复制要素类，然后处理副本。

Answer 2

您可以采取一些步骤来更有效地完成此任务。首先，使用数据分析器光标而不是旧版本的光标将提高进程的速度。这假设您在10.1或更高版本中工作。然后，您可以使用摘要统计信息，即根据案例字段查找最小值的能力。对于你的，案例字段将是nMSLINK。

下面的代码首先创建一个统计表，其中包含所有唯一的＆nMSLINK＆＃39;值及其相应的最小直径＆＃39;值。然后我使用表格选择仅选择表格中的频率为＆＃39; FREQUENCY＆＃39;字段不是1.从这里我遍历我的新表并开始构建一个字符串列表，组成一个最终的sql语句。在这次迭代之后，我使用python join函数创建一个类似于这样的sql字符串：

("nMSLINK" = 'value1' AND "DIAMETER" <> 624.0) OR ("nMSLINK" = 'value2' AND "DIAMETER" <> 1302.0) OR ("nMSLINK" = 'value3' AND "DIAMETER" <> 1036.0) ...

sql选择nMSLINK值不唯一且DIAMETER值不是最小值的行。使用此SQL，我按属性选择并删除所选行。

编写此SQL语句假设您的要素类位于文件地理数据库中并且该“

代码有以下输入：

功能：要分析的功能

工作区：暂时存储几个中间表的文件夹

TempTableName1：一个临时表的名称。

TempTableName2：第二个临时表的名称

Field1 =非唯一字段

Field2 =包含您希望找到

中最低值的数值的字段

代码：

# Import modules
from arcpy import *
import os
# Local variables

#Feature to analyze
Feature = r"C:\E1B8\ScriptTesting\Workspace\Workspace.gdb\testfeatureclass"
#Workspace to export table of identicals
Workspace = r"C:\E1B8\ScriptTesting\Workspace"
#Name of temp DBF table file
TempTableName1 = "Table1"
TempTableName2 = "Table2"

#Field names
Field1 = "nMSLINK" #nonunique
Field2 = "DIAMETER" #field with numeric values

#Make layer to allow selection
MakeFeatureLayer_management (Feature, "lyr")

#Path for first temp table
Table = os.path.join (Workspace, TempTableName1)

#Create statistics table with min value
Statistics_analysis (Feature, Table, [[Field2, "MIN"]], [Field1])

#SQL Select rows with frequency not equal to one
sql = '"FREQUENCY" <> 1'
# Path for second temp table
Table2 = os.path.join (Workspace, TempTableName2)
# Select rows with Frequency not equal to one
TableSelect_analysis (Table, Table2, sql)

#Empty list for sql bits
li = []

# Iterate through second table
cursor = da.SearchCursor (Table2, [Field1, "MIN_" + Field2])
for row in cursor:
    # Add SQL bit to list
    sqlbit = '("' + Field1 + '" = \'' + row[0] + '\' AND "' + Field2 + '" <> ' + str(row[1]) + ")"
    li.append (sqlbit)
del row
del cursor

#Create SQL for selection of unwanted features
sql = " OR ".join (li)
print sql
#Select based on SQL
SelectLayerByAttribute_management ("lyr", "", sql)

#Delete selected features
DeleteFeatures_management ("lyr")

#delete temp files
Delete_management ("lyr")
Delete_management (Table)
Delete_management (Table2)

这应该比直接光标更快。如果这是有道理的，请告诉我。祝你好运！

使用arcpy / numpy为每个唯一ID保留最小值

2 个答案: