潜在的重复检测,具有3个严重级别

时间:2014-01-20 01:15:27

标签: excel vba excel-vba duplicates

我想制作一个程序来检测具有3个严重性级别的潜在重复项。 让我认为我的数据只有两列,但有几千行。 第二列中的数据仅用逗号分隔。数据示例:

Number | Material
1      | helmet,valros,42
2      | helmet,iron,knight
3      | valros,helmet,42
4      | knight,helmet
5      | valros,helmet,42
6      | plain,helmet
7      | helmet, leather

我的3个等级是:

非常高: A,B,C vs A,B,C

高: A,B,C vs B,C,A

如此: A,B,C vs A,B

到目前为止我只能做第一级,我不知道如何做第二级和第三级。

我尝试过的。

Sub duplicates_separation()

    Dim duplicate(), i As Long
    Dim delrange As Range, cell As Long
    Dim shtIn As Worksheet, shtOut As Worksheet



     Set shtIn = ThisWorkbook.Sheets("input")
    Set shtOut = ThisWorkbook.Sheets("output")

    x = 2
    y = 1

    Set delrange = shtIn.Range("b1:b10000")  'set your range here

   ReDim duplicate(0)
'search duplicates in 2nd column
    For cell = 1 To delrange.Cells.Count
        If Application.CountIf(delrange, delrange(cell)) > 1 Then
            ReDim Preserve duplicate(i)
            duplicate(i) = delrange(cell).Address
            i = i + 1
        End If
    Next


        'print duplicates
    For i = UBound(duplicate) To LBound(duplicate) Step -1
    shtOut.Cells(x, 1).EntireRow.Value = shtIn.Range(duplicate(i)).EntireRow.Value


End Sub

程序检测到的重复项:

3      | valros,helmet,42
 5      | valros,helmet,42

我的期望:

Number | Material
1      | helmet,valros,42
3      | valros,helmet,42
5      | valros,helmet,42
4      | knight,helmet
2      | helmet,iron,knight        

我有一个检测重复lv 2的想法,但我认为它会如此复杂并使程序变慢。

  1. 将第2列转换为带有“text to columns”命令的列
  2. 从A到Z的排序列(按字母顺序排列)
  3. 连接专栏
  4. 像检测重复lv 1
  5. 那样做

    有没有办法检测第二个&第三级重复?


    更新

    昨天我去朋友家咨询了这个问题,但他的解决方案是在JAVA languange ..>我不明白

    public class ali {
    
        static void sPrint(String[] Printed) {
            for (int iC = 0; iC < Printed.length; iC++) {
                System.out.print(String.valueOf(Printed[iC]) + " | ");
            }
            System.out.println();
        }
    
        public static void main(String Args[]) {
            int defaultLength = 10;
            int indexID = 0;
            int indexDesc = 1;
            String[] DETECTORP1 = new String[defaultLength];
            String[] DETECTORP2 = new String[defaultLength];
            String[] DETECTORP3 = new String[defaultLength];
            String[] DETECTORP4 = new String[defaultLength];
            String[][] theString = new String[5][2];
            theString[0] = new String[]{"1", "A, B, C, D"};
            theString[1] = new String[]{"2", "A, B, C, D"};
            theString[2] = new String[]{"3", "A, B, C, D, E"};
            theString[3] = new String[]{"4", "A, B, D, C, E"};
            theString[4] = new String[]{"5", "A, B, D, C, E, F"};
            int P1 = 0;
            int P2 = 0;
            int P3 = 0;
            int P4 = 0;
            for (int iC = 0; iC < theString.length; iC++) {
                System.out.println(theString[iC][indexID] + " -> " + theString[iC][indexDesc]);
            }
            for (int iC = 0; iC < theString.length; iC++) {
                int LEX;
                String theReference[] = theString[iC][indexDesc].replace(",", ";;").split(";;");
                for (int iD = 0; iD < theString.length; iD++) {
                    if (iC != iD) {
                        String theCompare[] = theString[iD][1].replace(",", ";;").split(";;");
                        if (theReference.length == theCompare.length) {
                            LEX=0;
                            int theLength = theReference.length;
                            for (int iE = 0; iE < theLength; iE++) {
                                if (theReference[iE].equals(theCompare[iE])) {
                                    LEX += 1;
                                }
                            }
                            if (LEX == theLength) {
                                DETECTORP1[P1] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
                                P1 += 1;
                            } else {
                                LEX = 0;
                                for (int iF = 0; iF < theReference.length; iF++) {
                                    for (int iG = 0; iG < theCompare.length; iG++) {
                                        if (theReference[iF].equals(theCompare[iG])) {
                                            LEX += 1;
                                            break;
                                        }
                                    }
                                }
                                if (LEX == theReference.length) {
                                    DETECTORP2[P2] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
                                    P2 += 1;
                                }
    
                            }
    
                        } else {
                            LEX = 0;
                            if (theReference.length > theCompare.length) {
                                for (int iF = 0; iF < theReference.length; iF++) {
                                    for (int iG = 0; iG < theCompare.length; iG++) {
                                        if (iG == iF) {
                                            if (theReference[iF].equals(theCompare[iF])) {
                                                LEX += 1;
                                                break;
                                            }
                                        }
                                    }
                                }
                                if (LEX <= theReference.length && LEX >= theCompare.length) {
                                    DETECTORP3[P3] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
                                    P3 += 1;
                                }
                            } else {
                                LEX =0;
                                for (int iF = 0; iF < theCompare.length; iF++) {
                                    for (int iG = 0; iG < theReference.length; iG++) {
                                        if (iG == iF) {
                                            if (theCompare[iF].equals(theReference[iF])) {
                                                LEX += 1;
                                            //    System.out.println(theReference[iG] + "==" + theCompare[iG]);
                                                break;
                                            }
                                        }
                                    }
                                }
                                if (LEX <= theCompare.length && LEX >= theReference.length) {
                                    DETECTORP3[P3] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
                                    P3 += 1;
                                }
                            }
    
                        }
                    }
    
                }
    
            }
            sPrint(DETECTORP1);
            sPrint(DETECTORP2);
            sPrint(DETECTORP3);
        }
    }
    

    如何在VBA中执行此操作?

1 个答案:

答案 0 :(得分:1)

实际上,这取决于您希望如何定义“严重性级别”。这是一种方法,不一定是最好的:使用Levensthein距离。

通过单字符属性符号表示每个项目,例如

H    helmet
K    knight
I    iron
$    Leather
^    Valros
╔    Plain
¢    Whatever
etc.

然后将您的材料列表转换为包含表示这些属性的字符序列的字符串:

HIK = helmet,iron,knight
¢H  = plain,helmet

然后计算这两个字符串之间的Levenshtein距离。那将是你的“严重程度”。

Debug.Print LevenshteinDistance("HIK","¢H")
'returns 3

Levenshtein距离的两个实现显示在Wikipedia中。确实你很幸运:StackOverflow ported this to VBA上有人。

在下面的评论部分中,您说您不希望用单字符符号表示每个可能的属性。很公平;我同意这有点傻。解决方法:实际上,可以调整Levenshtein距离算法以查找字符串中的每个字符,而不是查找数组的每个元素,并根据它进行比较。我将在my answer中向您的后续问题展示如何进行此更改。