突出显示找到的文本,如果它是一个完整的单词,计算重复单词的分离,处理所选文本

时间:2016-09-28 11:30:34

标签: vba ms-word highlight

我需要突出文档中的单词并显示它们之间距离的一些线索。我有这个程序,我是从不同的部分组装的。

Sub RepeatedWordsShow()

    Const maxwords = 9000                               'Maximum words allowed
    Dim SingleWordA As String                       'Raw word pulled from doc
    Dim SingleWordR As String               'Raw word to compare
    Dim RepeatedWords(maxwords) As String       'Array to hold repeated words
    Dim RepeatedWordsDistances(maxwords) As Integer 'Distances of words
    Dim rwnum As Integer                'counter for repeated word list
    Dim worddistance As Integer             'Temporary variable
    Dim countWord As Integer                'Actual word
    Dim thisWord As Integer             'Actual comparing word
    Dim ttlwds As Long                                  'Total words in the document
    Dim Excludes As String                              'Words to be excluded
    Dim Found As Boolean                'Temporary flag
    Dim j, k As Integer                         'Temporary variables

    ' Set up excluded words
    Excludes = "[a][an][and][at][for][from][he][her][his][in][of][on][she][the][to][was]"

    Selection.HomeKey Unit:=wdStory
    System.Cursor = wdCursorWait
    ttlwds = ActiveDocument.Words.Count

    rwnum = 0
    countWord = 0
    thisWord = 0

    For Each aword In ActiveDocument.Words
        SingleWordA = Trim(LCase(aword))
        'Out of range?
        If SingleWordA < "a" Or SingleWordA > "z" Then
            SingleWordA = ""
        End If
        'On exclude list?
        If InStr(Excludes, "[" & SingleWordA & "]") Then
            SingleWordA = ""
        End If
        If Len(SingleWordA) > 0 Then
            For Each rword In ActiveDocument.Words
                SingleWordR = Trim(LCase(rword))
                If SingleWordR < "a" Or SingleWordR > "z" Then
                    SingleWordR = ""
                End If
                If InStr(Excludes, "[" & SingleWordR & "]") Then
                    SingleWordR = ""
                End If
                If Len(SingleWordR) > 0 Then
                    If SingleWordR = SingleWordA Then
                        If thisWord <> countWord Then
                           RepeatedWords(rwnum) = SingleWordR
                           worddistance = thisWord - countWord
                           RepeatedWordsDistances(rwnum) = worddistance
                           rwnum = rwnum + 1
                        End If
                    End If
                End If
            thisWord = thisWord + 1
            Next rword
            thisWord = 0

        End If
        ttlwds = ttlwds - 1
        StatusBar = "Remaining: " & ttlwds
        countWord = countWord + 1
    Next aword

    'Highlights and Underlines words

    k = 0
    For Each itm In RepeatedWords
        Selection.Find.ClearFormatting
        Selection.HomeKey wdStory, wdMove
        Selection.Find.Execute itm
        Do Until Selection.Find.Found = False
            'Selection.Font.Underline = wdUnderlineDotted
            If RepeatedWordsDistances(k) > 0 And RepeatedWordsDistances(k) < 6 Then
            Selection.Font.Underline = wdUnderlineDouble
            End If
            If RepeatedWordsDistances(k) > 5 And RepeatedWordsDistances(k) < 11 Then
            Selection.Font.Underline = wdUnderlineSingle
            End If
            If RepeatedWordsDistances(k) > 10 And RepeatedWordsDistances(k) < 21 Then
            Selection.Font.Underline = wdUnderlineDash
            End If
            If RepeatedWordsDistances(k) > 20 And RepeatedWordsDistances(k) < 51 Then
            Selection.Font.Underline = wdUnderlineDotted
            End If
            Selection.Range.HighlightColorIndex = wdTurquoise
            Selection.Find.Execute
        Loop
    k = k + 1
    Next

    'Now report the results
    'tmpName = ActiveDocument.AttachedTemplate.FullName
    'Documents.Add Template:=tmpName, NewTemplate:=False
    'Selection.ParagraphFormat.TabStops.ClearAll
    'With Selection
    '    For j = 1 To rwnum
    '        .TypeText Text:=Trim(Str(RepeatedWordsDistances(j))) _
    '          & vbTab & RepeatedWords(j) & vbCrLf
    '    Next j
    'End With

    System.Cursor = wdCursorNormal

End Sub

目标是突出显示重复的单词(现在为绿松石色),并根据出现的各种距离范围(双线到最近的单词,点到远)强调单词。

此解决方案存在一些问题:

1)最后一个块:'突出显示并强调单词'处理单词内的单词。例如,如果在所获得的列表中将光标记为重复的单词(重复词),则在“突出显示”中标记“亮”。我怎么能避免这种情况?

2)距离编号会计算文本中的逗号,点和分号,即使它们没有出现在列表中也是如此。最后一个评论部分会生成一个报告,以便查看该内容。

3)完成该宏之后,用户如何对标记的单词执行操作(双击,中键,三击,类似的东西,然后相同的单词突出显示另一个颜色(黄色)来识别它们?

在行动See it in action

中查看

2 个答案:

答案 0 :(得分:0)

1)Find.Execute方法有many optionsMatchWholeWord可能就是你要找的东西。我建议您在执行之前设置选项,例如在此示例中从MSDN站点

获取
With Selection.Find 
    .ClearFormatting 
    .MatchWholeWord = True 
    .MatchCase = False 
    .Execute FindText:="library" 
End With

2)Punctuation and paragraph marks in a document are included in the Words collection. 由于您手动计数,您可以手动排除标点符号。它有点乱,但我能想到最快的解决方案。像

这样的东西
If Instr(".,;:!?...", SingleWordR) = 0 Then thisWord = thisWord + 1 'would have to put all punctuation character inside the string

If SingleWordR Like "[!a-zA-Z]" Then thisWord = thisWord + 1 'this would not include special characters though

编辑:这也必须使用countword

3)显然是选择更改事件is possible但更简单的方法可能是编写一个宏来检查当前选择是否突出显示并突出显示其他事件,然后为其分配一个键命令(通过单词选项) )。同样的宏也可用于将单词重新着色为绿松石(必须由用户再次运行)。

我不确定你的突出显示和距离计数。您存储出现多次的每个单词以及与其他每个单词出现的距离。因此,像“a b b a b”这样的文本的数组看起来像这样

a  3
b  1
b  3
a -3
b -1
b  2
b -3
b -2

然后根据第二列突出显示每个条目的所有出现。这意味着唯一的相关值是最后一个(正数),在这种情况下,2表示a,2表示b。在您的示例中可以看到“day”。前两次出现少于21次,下划线加点。

你也在循环遍历每个单词的所有单词。这意味着(单词数)*(单词数)迭代。你应该至少停止一次thisword-countword&gt; 50。

也许另一种方法是用5,10,20,50个单词向前看每个单词,并突出显示所有其他出现的单词(以及单词本身)。

答案 1 :(得分:0)

我来了这个:

Sub RepeatedWordsShow()
    Const maxwords = 9000                               'Maximum unique words allowed
    Dim SingleWordA As String                           'Raw word pulled from doc
    Dim SingleWordR As String
    Dim RepeatedWords(maxwords) As String
    Dim RepeatedWordsDistances(maxwords) As Integer
    Dim rwnum As Integer
    Dim worddistance As Integer
    Dim countWord As Integer
    Dim thisWord As Integer
    Dim ttlwds As Long                      'Total words in the document
    Dim Excludes As String                  'Words to be excluded
    Dim Found As Boolean                    'Temporary flag
    Dim j, k As Integer                     'Temporary variables

    ' Set up excluded words
    Excludes = "[a][an][and][at][for][from][he][her][his][in][of][on][she][the][to][was][with]"


    Selection.HomeKey Unit:=wdStory
    System.Cursor = wdCursorWait
    ttlwds = ActiveDocument.Words.Count

    rwnum = 0
    countWord = 0
    thisWord = 0

    For Each aword In ActiveDocument.Words
        SingleWordA = Trim(LCase(aword))
        'Out of range?
        If SingleWordA < "a" Or SingleWordA > "z" Then
            SingleWordA = ""
        End If
        'On exclude list?
        If InStr(Excludes, "[" & SingleWordA & "]") Then
            SingleWordA = ""
            countWord = countWord + 1
        End If
        'If word is valid, compare with all words
        If Len(SingleWordA) > 0 And SingleWordA Like "[a-zA-Z]*" Then
            'Comparing
            countWord = countWord + 1
            For Each rword In ActiveDocument.Words
                SingleWordR = Trim(LCase(rword))
                If SingleWordR < "a" Or SingleWordR > "z" Then
                    SingleWordR = ""
                End If
                If InStr(Excludes, "[" & SingleWordR & "]") Then
                    SingleWordR = ""
                    thisWord = thisWord + 1
                End If
                If Len(SingleWordR) > 0 And SingleWordR Like "[a-zA-Z]*" Then
                    thisWord = thisWord + 1
                    If SingleWordR = SingleWordA Then
                        If thisWord <> countWord Then
                           RepeatedWords(rwnum) = SingleWordR
                           worddistance = thisWord - countWord
                           RepeatedWordsDistances(rwnum) = Abs(worddistance)
                           rwnum = rwnum + 1
                           'Debug.Print "Repeated Word:" + RepeatedWords(rwnum) + "  Distance:" + Str(RepeatedWordsDistances(rwnum))
                        End If
                    End If
                End If
            Next rword
            thisWord = 0

        End If
       ttlwds = ttlwds - 1
       StatusBar = "Remaining: " & ttlwds

    Next aword


    'Highlighs and Underlines words

    k = 0
    For Each itm In RepeatedWords
        Selection.Find.ClearFormatting
        Selection.Find.MatchWholeWord = True
        Selection.HomeKey wdStory, wdMove
        Selection.Find.Execute itm
        Do Until Selection.Find.Found = False
            If RepeatedWordsDistances(k) > 0 And RepeatedWordsDistances(k) < 6 Then
            Selection.Font.Underline = wdUnderlineDouble
            End If
            If RepeatedWordsDistances(k) > 5 And RepeatedWordsDistances(k) < 11 Then
            Selection.Font.Underline = wdUnderlineSingle
            End If
            If RepeatedWordsDistances(k) > 10 And RepeatedWordsDistances(k) < 21 Then
            Selection.Font.Underline = wdUnderlineDash
            End If
            If RepeatedWordsDistances(k) > 20 Then
            Selection.Font.Underline = wdUnderlineDotted
            End If
            Selection.Range.HighlightColorIndex = wdTurquoise
            Selection.Find.Execute
        Loop
    k = k + 1
    Next

    'Now write out the results
    'tmpName = ActiveDocument.AttachedTemplate.FullName
    'Documents.Add Template:=tmpName, NewTemplate:=False
    'Selection.ParagraphFormat.TabStops.ClearAll
    'With Selection
    '    For j = 1 To rwnum
    '        .TypeText Text:=Trim(Str(RepeatedWordsDistances(j))) _
    '          & vbTab & RepeatedWords(j) & vbCrLf
    '    Next j
    'End With

    'Write results in immediate window
    For j = 1 To rwnum
     Debug.Print "Repeated Word:" + RepeatedWords(j) + "  Distance:" + Str(RepeatedWordsDistances(j))
    Next j

    System.Cursor = wdCursorNormal
End Sub

你的答案,arcadeprecinct:

1)我已实现了这一点,这正是我所需要的(抱歉,我不是一位经验丰富的程序员).MatchWholeWord = True按预期工作。

2)我也实现了这一点。表达式Like "[a-zA-Z]*"非常有效。

3)让我解释一下,我这样做的原因是因为我用西班牙语写作,而且我发现这是写作的常见问题。我必须检查并重新检查我的文本,有时很难发现重复的单词。有些重复是故意的,有些则不是。我尝试了一些软件,Scrivener只报告文本统计信息,但我必须关闭报告并手动搜索所有事件。 MSWord只计算单词。 ProWritingAid做得非常好,但文本仍然必须粘贴在他们的盒子里。 Notepad ++有一个功能:当您单击任何单词时,它会突出显示同一文本中的所有相同单词,但仍然必须在原始文本中进行更正,从而丢失所有格式。我的宏帮助我直接处理文本,ProWritingAid和Notepad ++的组合。

所以我期待:

a)通过特别点击一个单词突出显示相同的单词。突出显示告诉我&#34;这个词重复,2次或更多次&#34; (为出现范围实现颜色代码会很好)。下划线功能的工作方式如下:最下划线是最近的重复,使用该方法更容易发现它。这取决于我作为一个作家来保留这个词。

b)提供使用的上下文:文档,部分或段落。显然,我的宏将难以在更多单词的文档中使用,因此,正如您所说,我将尝试仅限制部分或有限数量的单词。如果我在一个很长的长文件中使用它,它可能会挂起或崩溃系统。

我解决了&#34;负距离&#34;与Abs(worddistance)

无论如何,我会尝试另一种方法,正如你所说,因为我同意我的代码效率低下。我的兴趣是强调最近的单词。远程重复是最不重要的,但它最终必须标记每个重复的单词。你能帮助我改进主For-Each块中的代码吗?目标是:两个相同单词之间的任何给定距离,重要的一个是较小的单词,因此它必须用最重的下划线(Double)标记单词,即使其他出现的单词是远的。

我是否可以继续在此主题中发布我的未来版本?我是新手。

非常感谢您提供的宝贵帮助!

编辑你说

  

&#34;也许另一种方法是先看看每个单词5,10,   20,50个单词并突出显示所有其他事件(和单词   如果找到的话。&#34;

我同意这一点,但展望未来直至文本结束,突出显示所有出现的单词本身,并仅存储下划线的较小距离。那会很好但是此刻我不知道如何从实际的单词中搜索到前进。我会弄清楚,或者你可以给我一个线索。