我需要突出文档中的单词并显示它们之间距离的一些线索。我有这个程序,我是从不同的部分组装的。
Sub RepeatedWordsShow()
Const maxwords = 9000 'Maximum words allowed
Dim SingleWordA As String 'Raw word pulled from doc
Dim SingleWordR As String 'Raw word to compare
Dim RepeatedWords(maxwords) As String 'Array to hold repeated words
Dim RepeatedWordsDistances(maxwords) As Integer 'Distances of words
Dim rwnum As Integer 'counter for repeated word list
Dim worddistance As Integer 'Temporary variable
Dim countWord As Integer 'Actual word
Dim thisWord As Integer 'Actual comparing word
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k As Integer 'Temporary variables
' Set up excluded words
Excludes = "[a][an][and][at][for][from][he][her][his][in][of][on][she][the][to][was]"
Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
ttlwds = ActiveDocument.Words.Count
rwnum = 0
countWord = 0
thisWord = 0
For Each aword In ActiveDocument.Words
SingleWordA = Trim(LCase(aword))
'Out of range?
If SingleWordA < "a" Or SingleWordA > "z" Then
SingleWordA = ""
End If
'On exclude list?
If InStr(Excludes, "[" & SingleWordA & "]") Then
SingleWordA = ""
End If
If Len(SingleWordA) > 0 Then
For Each rword In ActiveDocument.Words
SingleWordR = Trim(LCase(rword))
If SingleWordR < "a" Or SingleWordR > "z" Then
SingleWordR = ""
End If
If InStr(Excludes, "[" & SingleWordR & "]") Then
SingleWordR = ""
End If
If Len(SingleWordR) > 0 Then
If SingleWordR = SingleWordA Then
If thisWord <> countWord Then
RepeatedWords(rwnum) = SingleWordR
worddistance = thisWord - countWord
RepeatedWordsDistances(rwnum) = worddistance
rwnum = rwnum + 1
End If
End If
End If
thisWord = thisWord + 1
Next rword
thisWord = 0
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds
countWord = countWord + 1
Next aword
'Highlights and Underlines words
k = 0
For Each itm In RepeatedWords
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
'Selection.Font.Underline = wdUnderlineDotted
If RepeatedWordsDistances(k) > 0 And RepeatedWordsDistances(k) < 6 Then
Selection.Font.Underline = wdUnderlineDouble
End If
If RepeatedWordsDistances(k) > 5 And RepeatedWordsDistances(k) < 11 Then
Selection.Font.Underline = wdUnderlineSingle
End If
If RepeatedWordsDistances(k) > 10 And RepeatedWordsDistances(k) < 21 Then
Selection.Font.Underline = wdUnderlineDash
End If
If RepeatedWordsDistances(k) > 20 And RepeatedWordsDistances(k) < 51 Then
Selection.Font.Underline = wdUnderlineDotted
End If
Selection.Range.HighlightColorIndex = wdTurquoise
Selection.Find.Execute
Loop
k = k + 1
Next
'Now report the results
'tmpName = ActiveDocument.AttachedTemplate.FullName
'Documents.Add Template:=tmpName, NewTemplate:=False
'Selection.ParagraphFormat.TabStops.ClearAll
'With Selection
' For j = 1 To rwnum
' .TypeText Text:=Trim(Str(RepeatedWordsDistances(j))) _
' & vbTab & RepeatedWords(j) & vbCrLf
' Next j
'End With
System.Cursor = wdCursorNormal
End Sub
目标是突出显示重复的单词(现在为绿松石色),并根据出现的各种距离范围(双线到最近的单词,点到远)强调单词。
此解决方案存在一些问题:
1)最后一个块:'突出显示并强调单词'处理单词内的单词。例如,如果在所获得的列表中将光标记为重复的单词(重复词),则在“突出显示”中标记“亮”。我怎么能避免这种情况?
2)距离编号会计算文本中的逗号,点和分号,即使它们没有出现在列表中也是如此。最后一个评论部分会生成一个报告,以便查看该内容。
3)完成该宏之后,用户如何对标记的单词执行操作(双击,中键,三击,类似的东西,然后相同的单词突出显示另一个颜色(黄色)来识别它们?
在行动
答案 0 :(得分:0)
1)Find.Execute
方法有many options。 MatchWholeWord
可能就是你要找的东西。我建议您在执行之前设置选项,例如在此示例中从MSDN站点
With Selection.Find
.ClearFormatting
.MatchWholeWord = True
.MatchCase = False
.Execute FindText:="library"
End With
2)Punctuation and paragraph marks in a document are included in the Words collection. 由于您手动计数,您可以手动排除标点符号。它有点乱,但我能想到最快的解决方案。像
这样的东西If Instr(".,;:!?...", SingleWordR) = 0 Then thisWord = thisWord + 1 'would have to put all punctuation character inside the string
或
If SingleWordR Like "[!a-zA-Z]" Then thisWord = thisWord + 1 'this would not include special characters though
编辑:这也必须使用countword
3)显然是选择更改事件is possible但更简单的方法可能是编写一个宏来检查当前选择是否突出显示并突出显示其他事件,然后为其分配一个键命令(通过单词选项) )。同样的宏也可用于将单词重新着色为绿松石(必须由用户再次运行)。
我不确定你的突出显示和距离计数。您存储出现多次的每个单词以及与其他每个单词出现的距离。因此,像“a b b a b”这样的文本的数组看起来像这样
a 3
b 1
b 3
a -3
b -1
b 2
b -3
b -2
然后根据第二列突出显示每个条目的所有出现。这意味着唯一的相关值是最后一个(正数),在这种情况下,2表示a,2表示b。在您的示例中可以看到“day”。前两次出现少于21次,下划线加点。
你也在循环遍历每个单词的所有单词。这意味着(单词数)*(单词数)迭代。你应该至少停止一次thisword-countword&gt; 50。
也许另一种方法是用5,10,20,50个单词向前看每个单词,并突出显示所有其他出现的单词(以及单词本身)。
答案 1 :(得分:0)
我来了这个:
Sub RepeatedWordsShow()
Const maxwords = 9000 'Maximum unique words allowed
Dim SingleWordA As String 'Raw word pulled from doc
Dim SingleWordR As String
Dim RepeatedWords(maxwords) As String
Dim RepeatedWordsDistances(maxwords) As Integer
Dim rwnum As Integer
Dim worddistance As Integer
Dim countWord As Integer
Dim thisWord As Integer
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k As Integer 'Temporary variables
' Set up excluded words
Excludes = "[a][an][and][at][for][from][he][her][his][in][of][on][she][the][to][was][with]"
Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
ttlwds = ActiveDocument.Words.Count
rwnum = 0
countWord = 0
thisWord = 0
For Each aword In ActiveDocument.Words
SingleWordA = Trim(LCase(aword))
'Out of range?
If SingleWordA < "a" Or SingleWordA > "z" Then
SingleWordA = ""
End If
'On exclude list?
If InStr(Excludes, "[" & SingleWordA & "]") Then
SingleWordA = ""
countWord = countWord + 1
End If
'If word is valid, compare with all words
If Len(SingleWordA) > 0 And SingleWordA Like "[a-zA-Z]*" Then
'Comparing
countWord = countWord + 1
For Each rword In ActiveDocument.Words
SingleWordR = Trim(LCase(rword))
If SingleWordR < "a" Or SingleWordR > "z" Then
SingleWordR = ""
End If
If InStr(Excludes, "[" & SingleWordR & "]") Then
SingleWordR = ""
thisWord = thisWord + 1
End If
If Len(SingleWordR) > 0 And SingleWordR Like "[a-zA-Z]*" Then
thisWord = thisWord + 1
If SingleWordR = SingleWordA Then
If thisWord <> countWord Then
RepeatedWords(rwnum) = SingleWordR
worddistance = thisWord - countWord
RepeatedWordsDistances(rwnum) = Abs(worddistance)
rwnum = rwnum + 1
'Debug.Print "Repeated Word:" + RepeatedWords(rwnum) + " Distance:" + Str(RepeatedWordsDistances(rwnum))
End If
End If
End If
Next rword
thisWord = 0
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds
Next aword
'Highlighs and Underlines words
k = 0
For Each itm In RepeatedWords
Selection.Find.ClearFormatting
Selection.Find.MatchWholeWord = True
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
If RepeatedWordsDistances(k) > 0 And RepeatedWordsDistances(k) < 6 Then
Selection.Font.Underline = wdUnderlineDouble
End If
If RepeatedWordsDistances(k) > 5 And RepeatedWordsDistances(k) < 11 Then
Selection.Font.Underline = wdUnderlineSingle
End If
If RepeatedWordsDistances(k) > 10 And RepeatedWordsDistances(k) < 21 Then
Selection.Font.Underline = wdUnderlineDash
End If
If RepeatedWordsDistances(k) > 20 Then
Selection.Font.Underline = wdUnderlineDotted
End If
Selection.Range.HighlightColorIndex = wdTurquoise
Selection.Find.Execute
Loop
k = k + 1
Next
'Now write out the results
'tmpName = ActiveDocument.AttachedTemplate.FullName
'Documents.Add Template:=tmpName, NewTemplate:=False
'Selection.ParagraphFormat.TabStops.ClearAll
'With Selection
' For j = 1 To rwnum
' .TypeText Text:=Trim(Str(RepeatedWordsDistances(j))) _
' & vbTab & RepeatedWords(j) & vbCrLf
' Next j
'End With
'Write results in immediate window
For j = 1 To rwnum
Debug.Print "Repeated Word:" + RepeatedWords(j) + " Distance:" + Str(RepeatedWordsDistances(j))
Next j
System.Cursor = wdCursorNormal
End Sub
你的答案,arcadeprecinct:
1)我已实现了这一点,这正是我所需要的(抱歉,我不是一位经验丰富的程序员).MatchWholeWord = True
按预期工作。
2)我也实现了这一点。表达式Like "[a-zA-Z]*"
非常有效。
所以我期待:
a)通过特别点击一个单词突出显示相同的单词。突出显示告诉我&#34;这个词重复,2次或更多次&#34; (为出现范围实现颜色代码会很好)。下划线功能的工作方式如下:最下划线是最近的重复,使用该方法更容易发现它。这取决于我作为一个作家来保留这个词。
b)提供使用的上下文:文档,部分或段落。显然,我的宏将难以在更多单词的文档中使用,因此,正如您所说,我将尝试仅限制部分或有限数量的单词。如果我在一个很长的长文件中使用它,它可能会挂起或崩溃系统。
我解决了&#34;负距离&#34;与Abs(worddistance)
。
无论如何,我会尝试另一种方法,正如你所说,因为我同意我的代码效率低下。我的兴趣是强调最近的单词。远程重复是最不重要的,但它最终必须标记每个重复的单词。你能帮助我改进主For-Each块中的代码吗?目标是:两个相同单词之间的任何给定距离,重要的一个是较小的单词,因此它必须用最重的下划线(Double)标记单词,即使其他出现的单词是远的。
我是否可以继续在此主题中发布我的未来版本?我是新手。
非常感谢您提供的宝贵帮助!
编辑你说
&#34;也许另一种方法是先看看每个单词5,10, 20,50个单词并突出显示所有其他事件(和单词 如果找到的话。&#34;
我同意这一点,但展望未来直至文本结束,突出显示所有出现的单词本身,并仅存储下划线的较小距离。那会很好但是此刻我不知道如何从实际的单词中搜索到前进。我会弄清楚,或者你可以给我一个线索。