Question

我有一个像下面这样的dbf表，它是两个表中一对多连接的结果。我希望从一个Taxlot id字段中获得唯一的区域值。

表名：输入表
tid -----区域 1 ------ A
1 ------ A
1 ------ B
1 ------ C
2 ------ D
2 ------ E
3 ------ C

理想的输出表表名：输入表
tid -----区域 1 ------ A，B，C
2 ------ D，E
3 ------ C

我得到了一些帮助但无法使其发挥作用。

inputTbl = r"C:\temp\input.dbf"
taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
   if searchRow.TID in taxIdZoningDict:
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   else:
      taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   searchRow = searchRows.next()

outputTbl = r"C:\temp\output.dbf"
gp.CreateTable_management(r"C:\temp", "output.dbf")
gp.AddField_management(outputTbl, "TID", "LONG")
gp.AddField_management(outputTbl, "ZONES", "TEXT", "", "", "20")
tidList = taxIdZoningDict.keys()
tidList.sort() #sorts in ascending order
insertRows = gp.insertcursor(outputTbl)
for tid in tidList:
   concatString = ""
   for zone in taxIdZoningDict[tid]
      concatString = concatString + zone + ","
   insertRow = insertRows.newrow()
   insertRow.TID = tid
   insertRow.ZONES = concatString[:-1]
   insertRows.insertrow(insertRow)
del insertRow
del insertRows

Answer 1

我会使用my dbf module和defaultdict来大大简化该代码：

import dbf
from collections import defaltdict

inputTbl = dbf.Table(r'c:\temp\input.dbf')
taxIdZoning = defaultdict(set)

for record in inputTbl:
    taxIdZoning[record.tid].add(record.zone)
inputTbl.close()

outputTbl = dbf.Table(r'c:\temp\output.dbf', 'tid N(17.0), zones C(20)')
for tid in sorted(taxIdZoning):
    record = outputTbl.append()
    record.tid = tid
    record.zones = ','.join(sorted(taxIdZoning[tid]))
outputTbl.close()

注意：字段名称是小写的，我不确定如何表示LONG，但希望17位数就足够了。 :)我为任何错误道歉 - 没有输入文件很难测试。

Answer 2

这对我来说同时使用Microsoft Access VBA和Microsoft Excel VBA。它不是非常有效的代码，但它的工作原理。我能够在Access和Excel中打开结果文件。

设置sDBF*和sOutDBF*变量，使其适合您自己的自定义路径。

Sub VBASolution()
    Dim oRS
    Dim sConn
    Dim sDBFPath, sOutDBFPath
    Dim sDBFName, sOutDBFName
    Dim oDict
    Dim curTID, curZone, sZones
    Dim oConn
    Dim oFS
    Dim sTableName

    sDBFPath = "C:\Path\To\DBFs\"
    sDBFName = "Input.dbf"

    sOutDBFPath = "C:\Path\To\DBFs\"
    sOutDBFName = "RESULTS.dbf"

    sConn = "Driver={Microsoft dBASE Driver (*.dbf)}; DriverID=277; Dbq="
    Set oRS = CreateObject("ADODB.Recordset")


    oRS.Open "SELECT DISTINCT tid, zone FROM " & sDBFName, sConn & sDBFPath

    Set oDict = CreateObject("Scripting.Dictionary")

    Do While Not oRS.EOF
        curTID = oRS.Fields("tid").Value
        curZone = oRS.Fields("zone").Value
        If Not oDict.Exists(curTID) Then
            oDict.Add curTID, CreateObject("Scripting.Dictionary")
        End If
        If Not oDict(curTID).Exists(curZone) Then
            oDict(curTID).Add curZone, curZone
        End If
        oRS.MoveNext
    Loop
    oRS.Close

    Set oRS = Nothing

    Set oConn = CreateObject("ADODB.Connection")
    oConn.Open sConn & sOutDBFPath

    'Delete the resultant DBF file if it already exists.
    Set oFS = CreateObject("Scripting.FileSystemObject")
    With oFS
        If .FileExists(sOutDBFPath & "\" & sOutDBFName) Then
            .DeleteFile sOutDBFPath & "\" & sOutDBFName
        End If
    End With

    sTableName = oFS.GetBaseName(sOutDBFName)

    oConn.Execute "CREATE TABLE " & sTableName & " (tid int, zone varchar(80))"

    Dim i, j
    For Each i In oDict.Keys
        curTID = i
        sZones = ""
        For Each j In oDict(i)
            sZones = sZones & "," & j
        Next
        sZones = Mid(sZones, 2)
        oConn.Execute "INSERT INTO " & sTableName & " (tid, zone) VALUES ('" & curTID & "','" & sZones & "')"
    Next
    oConn.Close

    Set oConn = Nothing
    Set oDict = Nothing
    Set oFS = Nothing
End Sub

编辑：对于它的价值，通过将其插入Windows XP中的VBScript .VBS文件（文本）并将此行添加到文件的底部，这对我也有用：

Call VBASolution()

我不知道是否需要安装Office或者Windows是否附带适当的dbf驱动程序。

Answer 3

我认为Morlock的答案不符合删除重复项的要求。我会使用defaultdict（set），它将自动省略dups，而不是defaultdict（list），因此.add（）而不是.append（）。

Answer 4

而不是：

taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
   if searchRow.TID in taxIdZoningDict:
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   else:
      taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   searchRow = searchRows.next()

这样做：

zones = {}
for row in gp.searchcursor(inputTbl):
    zones.setdefault(row.TID, set())
    zones[row.TID].add(row.ZONE)

更多pythonic，结果相同; - ）

然后输出：

for k, v in zones:
    print k, ", ".join(v)

Answer 5

这是一个快速制作的Python代码，可以满足您的需求，而且操作极少。

import collections

d = collections.defaultdict(list)

with open("input_file.txt") as f:   
    for line in f:
        parsed = line.strip().split()
        print parsed
        k = parsed[0]
        v = parsed[2]
        d[k].append(v)

for k, v in sorted(d.iteritems()):
    s = " ----- "
    v = list(set(v)) # Must be a library function to do this
    v.sort()
    print k, s,
    for j in v:
        print j,
    print

希望这有帮助

Answer 6

OP希望区域列中有逗号。可能会稍微更改Morlock代码的输出部分以使这些逗号和可能通过使用此单行输出而不是v上的显式循环来更清晰：

    print k, s, ",".join(v)

这包含更多的一行（可能是负数）。以这种方式使用join在python中非常常见，并且恕我直言表达意图（并且在阅读时更容易消化）而不是显式循环。

在一个记录中连接多个值而不重复

6 个答案: