我有一个像下面这样的dbf表,它是两个表中一对多连接的结果。我希望从一个Taxlot id字段中获得唯一的区域值。
表名:输入表
tid -----区域
1 ------ A
1 ------ A
1 ------ B
1 ------ C
2 ------ D
2 ------ E
3 ------ C
理想的输出表
表名:输入表
tid -----区域
1 ------ A,B,C
2 ------ D,E
3 ------ C
我得到了一些帮助但无法使其发挥作用。
inputTbl = r"C:\temp\input.dbf"
taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
if searchRow.TID in taxIdZoningDict:
taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
else:
taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
searchRow = searchRows.next()
outputTbl = r"C:\temp\output.dbf"
gp.CreateTable_management(r"C:\temp", "output.dbf")
gp.AddField_management(outputTbl, "TID", "LONG")
gp.AddField_management(outputTbl, "ZONES", "TEXT", "", "", "20")
tidList = taxIdZoningDict.keys()
tidList.sort() #sorts in ascending order
insertRows = gp.insertcursor(outputTbl)
for tid in tidList:
concatString = ""
for zone in taxIdZoningDict[tid]
concatString = concatString + zone + ","
insertRow = insertRows.newrow()
insertRow.TID = tid
insertRow.ZONES = concatString[:-1]
insertRows.insertrow(insertRow)
del insertRow
del insertRows
答案 0 :(得分:3)
我会使用my dbf module和defaultdict
来大大简化该代码:
import dbf
from collections import defaltdict
inputTbl = dbf.Table(r'c:\temp\input.dbf')
taxIdZoning = defaultdict(set)
for record in inputTbl:
taxIdZoning[record.tid].add(record.zone)
inputTbl.close()
outputTbl = dbf.Table(r'c:\temp\output.dbf', 'tid N(17.0), zones C(20)')
for tid in sorted(taxIdZoning):
record = outputTbl.append()
record.tid = tid
record.zones = ','.join(sorted(taxIdZoning[tid]))
outputTbl.close()
注意:字段名称是小写的,我不确定如何表示LONG,但希望17位数就足够了。 :)我为任何错误道歉 - 没有输入文件很难测试。
答案 1 :(得分:2)
这对我来说同时使用Microsoft Access VBA和Microsoft Excel VBA。它不是非常有效的代码,但它的工作原理。我能够在Access和Excel中打开结果文件。
设置sDBF*
和sOutDBF*
变量,使其适合您自己的自定义路径。
Sub VBASolution()
Dim oRS
Dim sConn
Dim sDBFPath, sOutDBFPath
Dim sDBFName, sOutDBFName
Dim oDict
Dim curTID, curZone, sZones
Dim oConn
Dim oFS
Dim sTableName
sDBFPath = "C:\Path\To\DBFs\"
sDBFName = "Input.dbf"
sOutDBFPath = "C:\Path\To\DBFs\"
sOutDBFName = "RESULTS.dbf"
sConn = "Driver={Microsoft dBASE Driver (*.dbf)}; DriverID=277; Dbq="
Set oRS = CreateObject("ADODB.Recordset")
oRS.Open "SELECT DISTINCT tid, zone FROM " & sDBFName, sConn & sDBFPath
Set oDict = CreateObject("Scripting.Dictionary")
Do While Not oRS.EOF
curTID = oRS.Fields("tid").Value
curZone = oRS.Fields("zone").Value
If Not oDict.Exists(curTID) Then
oDict.Add curTID, CreateObject("Scripting.Dictionary")
End If
If Not oDict(curTID).Exists(curZone) Then
oDict(curTID).Add curZone, curZone
End If
oRS.MoveNext
Loop
oRS.Close
Set oRS = Nothing
Set oConn = CreateObject("ADODB.Connection")
oConn.Open sConn & sOutDBFPath
'Delete the resultant DBF file if it already exists.
Set oFS = CreateObject("Scripting.FileSystemObject")
With oFS
If .FileExists(sOutDBFPath & "\" & sOutDBFName) Then
.DeleteFile sOutDBFPath & "\" & sOutDBFName
End If
End With
sTableName = oFS.GetBaseName(sOutDBFName)
oConn.Execute "CREATE TABLE " & sTableName & " (tid int, zone varchar(80))"
Dim i, j
For Each i In oDict.Keys
curTID = i
sZones = ""
For Each j In oDict(i)
sZones = sZones & "," & j
Next
sZones = Mid(sZones, 2)
oConn.Execute "INSERT INTO " & sTableName & " (tid, zone) VALUES ('" & curTID & "','" & sZones & "')"
Next
oConn.Close
Set oConn = Nothing
Set oDict = Nothing
Set oFS = Nothing
End Sub
编辑:对于它的价值,通过将其插入Windows XP中的VBScript .VBS文件(文本)并将此行添加到文件的底部,这对我也有用:
Call VBASolution()
我不知道是否需要安装Office或者Windows是否附带适当的dbf驱动程序。
答案 2 :(得分:1)
我认为Morlock的答案不符合删除重复项的要求。我会使用defaultdict(set),它将自动省略dups,而不是defaultdict(list),因此.add()而不是.append()。
答案 3 :(得分:1)
而不是:
taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
if searchRow.TID in taxIdZoningDict:
taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
else:
taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
searchRow = searchRows.next()
这样做:
zones = {}
for row in gp.searchcursor(inputTbl):
zones.setdefault(row.TID, set())
zones[row.TID].add(row.ZONE)
更多pythonic,结果相同; - )
然后输出:
for k, v in zones:
print k, ", ".join(v)
答案 4 :(得分:0)
这是一个快速制作的Python代码,可以满足您的需求,而且操作极少。
import collections
d = collections.defaultdict(list)
with open("input_file.txt") as f:
for line in f:
parsed = line.strip().split()
print parsed
k = parsed[0]
v = parsed[2]
d[k].append(v)
for k, v in sorted(d.iteritems()):
s = " ----- "
v = list(set(v)) # Must be a library function to do this
v.sort()
print k, s,
for j in v:
print j,
print
希望这有帮助
答案 5 :(得分:0)
OP希望区域列中有逗号。可能会稍微更改Morlock代码的输出部分以使这些逗号和可能通过使用此单行输出而不是v
上的显式循环来更清晰:
print k, s, ",".join(v)
这包含更多的一行(可能是负数)。以这种方式使用join
在python中非常常见,并且恕我直言表达意图(并且在阅读时更容易消化)而不是显式循环。