VBScript删除具有不同名称和扩展名的重复文件

时间:2013-06-02 00:15:12

标签: vbscript duplicate-removal

我已经广泛搜索了VBscript的答案,但已经放弃并需要帮助。

我想要完成的是找到具有不同文件名的明显重复的文件(对于人类来说显而易见)。我需要删除重复项,保留名称中没有轨道号的那些副本。如果我已经在MP3中使用它,我还需要删除任何M4A版本。

这甚至可能吗?我做了一点VBscripting,但这是我的有限编程能力。我甚至不打算在这里复制我尝试过的代码,因为它都没有用。

这是我正在尝试清理的示例文件夹。我只想留下这里的两首独特歌曲。我只想要MP3版本,我不想要他们名字中的曲目编号。

  

07坠入爱河(难以理解K.mp3)   1-15爱在电梯里.m4a
  1-15爱在Elevator.mp3中   15爱在Elevator.mp3中   2-07坠入爱河(在The.m4a上很难   2-07坠入爱河(很难在The.mp3上   坠入爱河(在膝盖上很难).mp3
  爱在Elevator.mp3中

谢谢!

1 个答案:

答案 0 :(得分:0)

这不是一项简单的任务,因为基本上你想要测量不同文件名的相似性/接近度。我的外行方法是从文件名中提取标题,对其进行标准化,然后使用最短的左对齐进行比较。这样的事情可能有用:

Set fso = CreateObject("Scripting.FileSystemObject")

Set re = New RegExp
re.Pattern = "^\d+(-\d+)?\s+"

Set rs = CreateObject("ADOR.Recordset")
rs.Fields.Append "NormalizedName", 200, 255
rs.Fields.Append "Length", 3
rs.Fields.Append "Path", 200, 255
rs.Open

' Store the full paths of the files and their associated normalized name in
' a disconnected recordset. The "Length" field is used for sorting (see below).
For Each f In fso.GetFolder("C:\some\folder").Files
  normalizedName = LCase(re.Replace(fso.GetBaseName(f.Name), ""))
  rs.AddNew
  rs("NormalizedName").Value = normalizedName
  rs("Length").Value = Len(normalizedName)
  rs("Path").Value = f.Path
  rs.Update
Next

' sort to ensure that the shortest normalized name always comes first
rs.Sort = "NormalizedName, Length ASC"

ref = ""
Set keeplist = CreateObject("Scripting.Dictionary")

rs.MoveFirst
Do Until rs.EOF
  path = rs("Path").Value
  name = rs("NormalizedName").Value
  currentExtension = LCase(fso.GetExtensionName(path))
  If ref <> "" And ref = Left(name, Len(ref)) Then
    ' same title as last file, so check if this one is a better match
    If extension <> "mp3" And currentExtension = "mp3" Then
      ' always pick MP3 version if it exists
      keeplist(ref) = path
      extension = currentExtension
    ElseIf extension = currentExtension _
        And IsNumeric(Left(fso.GetBaseName(keeplist(ref)), 1)) _
        And Not IsNumeric(Left(fso.GetBaseName(path), 1)) Then
      ' prefer file names not starting with a number when they have the
      ' same extension
      keeplist(ref) = path
    End If
  Else
    ' first file or different reference name
    ref = name
    extension = currentExtension
    keeplist.Add ref, path
  End If
  rs.MoveNext
Loop
rs.Close

For Each ref In keeplist
  WScript.Echo keeplist(ref)
Next

我很确定上面的代码没有涵盖边缘情况,所以要小心处理。另请注意,代码只处理单个文件夹。要处理文件夹树,需要其他代码(例如,请参阅here)。