我正在创建一个倒置索引来获取单词字典,其中包含单词出现的行号的相关列表(开始行号和出现在该行内给定单元格中的单词列表)。 / p>
我已经设法为此工作了一些代码,但我发现处理添加到数组(字典中的值)有点麻烦,我想知道是否有更高效或更优雅的方式来处理此
我愿意使用数组,集合或任何其他可以轻松搜索的数据类型来存储字典值中的行号列表。我已经粘贴了我的代码的缩减版本来演示下面的核心问题,问题实际上只是关于BuildInvertedIndex
过程,但其余部分是为了尝试更容易重新创建场景:
Sub Test()
' minimum included here to demonstrate use of buildInvertedIndex procedure
Dim vRange As Range
Dim vDict As Dictionary
Set vRange = ActiveSheet.Range("F2:F20585")
Set vDict = New Dictionary
BuildInvertedIndex vDict, vRange
' test values returned in dictionary (word: [line 1, ..., line n])
Dim k As Variant, vCounter As Long
vCounter = 0
For Each k In vDict.Keys
Debug.Print k & ": " & ArrayToString(vDict.Item(k))
vCounter = vCounter + 1
If vCounter >= 10 Then
Exit For
End If
Next
End Sub
Sub BuildInvertedIndex(pDict As Dictionary, pRange As Range)
Dim cell As Range
Dim words As Variant, word As Variant, val As Variant
Dim tmpArr() As Long
Dim newLen As Long, i As Long
' loop through cells (one col wide so same as looping through lines)
For Each cell In pRange.Cells
' loop through words in line
words = Split(cell.Value)
For Each word In words
If Not pDict.exists(word) Then
' start line array with first row number
pDict.Add word, Array(cell.Row())
Else
i = 0
If Not InArray(cell.Row(), pDict.Item(word)) Then
newLen = UBound(pDict.Item(word)) + 1
ReDim tmpArr(newLen)
For Each val In tmpArr
If i < newLen Then
tmpArr(i) = pDict.Item(word)(i)
Else
tmpArr(i) = cell.Row()
End If
i = i + 1
Next val
pDict.Item(word) = tmpArr
End If
End If
Next word
Next cell
End Sub
Function ArrayToString(vArray As Variant, _
Optional vDelim As String = ",") As String
' only included to support test (be able to see what is in the arrays)
Dim vDelimString As String
Dim i As Long
For i = LBound(vArray) To UBound(vArray)
vDelimString = vDelimString & CStr(vArray(i)) & _
IIf(vCounter < UBound(vArray), vDelim, "")
Next
ArrayToString = vDelimString
End Function
要运行此项,您将需要活动工作表(句子)的F列中的值,如果您还没有它,则还需要在VBA环境中为字典数据类型添加对Microsoft Scripting Runtime的引用可用(工具 - &gt;参考 - &gt; Microsoft Scripting Runtime)。
正如您将从代码中看到的那样,我需要在现有数组中插入新的行号(在字典中存储为值)时会有点混乱。由于我不知道只是扩展此数组的方法(不清除现有值),我使用变量tmpArr创建适当大小的数组,然后从字典中的现有数组中逐个复制值然后将当前行号添加到结尾。然后使用临时数组替换该键的现有值(当前单词)。
对此有任何建议将不胜感激。
答案 0 :(得分:1)
我愿意使用数组,集合或任何其他数据类型
正如我所见,使用collection而不是数组会更简单:
Sub BuildInvertedIndex(pDict As Dictionary, pRange As Range)
Dim cell As Range
Dim words, word
Dim i As Long
' loop through cells (one col wide so same as looping through lines)
For Each cell In pRange.Cells
' loop through words in line
words = Split(cell.Value)
For Each word In words
If Not pDict.Exists(word) Then
' initialize collection
pDict.Add word, New Collection
End If
'try to add to collection. If row is already in collecton, nothing happend. Storing key makes you sure there're only unique rows
On Error Resume Next
pDict.Item(word).Add Item:=cell.Row, Key:=CStr(cell.Row)
On Error GoTo 0
Next word
Next cell
End Sub
下一步,是将ArrayToString
略微修改为ColToString
:
Function ColToString(vCol As Collection, _
Optional vDelim As String = ",") As String
' only included to support test (be able to see what is in the arrays)
Dim vDelimString As String
Dim i As Long
For i = 1 To vCol.Count
vDelimString = vDelimString & CStr(vCol.Item(i)) & _
IIf(i < vCol.Count, vDelim, "")
Next
ColToString = vDelimString
End Function
和测试子程序(仅更改了一行 - Debug.Print k & ": " & ColToString(vDict.Item(k))
,目标范围更改为"F2:F5"
):
Sub Test()
' minimum included here to demonstrate use of buildInvertedIndex procedure
Dim vRange As Range
Dim vDict As Dictionary
Set vRange = ActiveSheet.Range("F2:F5")
Set vDict = New Dictionary
BuildInvertedIndex vDict, vRange
' test values returned in dictionary (word: [line 1, ..., line n])
Dim k As Variant, vCounter As Long
vCounter = 0
For Each k In vDict.Keys
Debug.Print k & ": " & ColToString(vDict.Item(k))
vCounter = vCounter + 1
If vCounter >= 10 Then
Exit For
End If
Next
'clean up memory
Set vDict = Nothing
End Sub
<强>结果:强>
<强>更新强>
为了提高代码的速度,您可以在数组中存储范围(下一个方法仅适用于单列范围,但您可以轻松修改它):
测试子:
Sub TestWirhArray()
' minimum included here to demonstrate use of buildInvertedIndex procedure
Dim vRange As Range
Dim vDict As Dictionary
Dim myArr As Variant
Set vDict = New Dictionary
Set vRange = ActiveSheet.Range("F2:F20585")
myArr = vRange.Value
BuildInvertedIndexWithArr vDict, myArr, vRange.Row
' test values returned in dictionary (word: [line 1, ..., line n])
Dim k As Variant, vCounter As Long
vCounter = 0
For Each k In vDict.Keys
Debug.Print k & ": " & ColToString(vDict.Item(k))
vCounter = vCounter + 1
If vCounter >= 10 Then
Exit For
End If
Next
'clean up memory
Set vDict = Nothing
End Sub
新版BuildInvertedIndexWithArr
:
Sub BuildInvertedIndexWithArr(pDict As Dictionary, pArr, firstRow As Long)
Dim cell, words, word
Dim i As Long, j As Long
j = firstRow
' loop through cells (one col wide so same as looping through lines)
For Each cell In pArr
' loop through words in line
words = Split(cell)
For Each word In words
If Not pDict.exists(word) Then
' initialize collection
pDict.Add word, New Collection
End If
On Error Resume Next
pDict.Item(word).Add Item:=j, Key:=CStr(j)
On Error GoTo 0
Next word
j = j + 1
Next cell
End Sub