我正在寻找一个宏(最好是一个函数)来获取单元格内容,将其拆分为单独的单词,将它们相互比较并删除较短的单词。
这是我希望输出看起来像的图像(我需要删除掉的字词):
我自己尝试编写一个宏,但它没有正确地工作,因为它没有采取最后的措辞,有时会删除不应删除的内容。此外,我必须在大约50k的单元格上执行此操作,因此宏需要花费大量时间来运行,这就是为什么我更喜欢它作为一个函数。我想我不应该使用replace
功能,但我无法做任何其他工作。
Sub clean_words_containing_eachother()
Dim sht1 As Worksheet
Dim LastRow As Long
Dim Cell As Range
Dim cell_value As String
Dim word, word2 As Variant
Set sht1 = ActiveSheet
col = InputBox("Which column do you want to clear?")
LastRow = sht1.Cells(sht1.Rows.Count, col).End(xlUp).Row
Let to_clean = col & "2:" & col & LastRow
For i = 2 To LastRow
For Each Cell In sht1.Range(to_clean)
cell_value = Cell.Value
cell_split = Split(cell_value, " ")
For Each word In cell_split
For Each word2 In cell_split
If word <> word2 Then
If InStr(word2, word) > 0 Then
If Len(word) < Len(word2) Then
word = word & " "
Cell = Replace(Cell, word, " ")
ElseIf Len(word) > Len(word2) Then
word2 = word2 & " "
Cell = Replace(Cell, word2, " ")
End If
End If
End If
Next word2
Next word
Next Cell
Next i
End Sub
答案 0 :(得分:1)
假设第一个示例中第三个单词的保留是错误的,因为书籍稍后会包含在笔记本中:
5003886 book books bound case casebound not notebook notebooks office oxford sign signature
并且假设你想要删除重复的相同单词,即使它们后面没有包含在另一个单词中,那么我们也可以使用正则表达式。
正则表达式将:
由于VBA正则表达式也不能后悔,我们通过在反向字符串上第二次运行正则表达式来解决这个限制。
然后删除多余的空格,我们就完成了。
Option Explicit
Function cleanWords(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\b(\w+)\b(?=.*\1)"
.ignorecase = True
'replace looking forward
sTemp = .Replace(S, "")
' check in reverse
sTemp = .Replace(StrReverse(sTemp), "")
'return to normal
sTemp = StrReverse(sTemp)
'Remove extraneous spaces
cleanWords = WorksheetFunction.Trim(sTemp)
End With
End Function
<强>限制强>
[_A-Za-z0-9]
中的字符(字母,数字和下划线)。答案 1 :(得分:0)