在Ruby 1.9.3上使用大括号进行Globbing

时间:2016-01-05 06:21:18

标签: ruby glob ruby-1.9

如果使用File :: FNM_EXTGLOB选项

,最新版本的Ruby支持在globbing中使用大括号

来自2.2.0 documentation

File.fnmatch('c{at,ub}s', 'cats', File::FNM_EXTGLOB) #=> true  # { } is supported on FNM_EXTGLOB

然而,1.9.3文档说它在1.9.3中不受支持:

File.fnmatch('c{at,ub}s', 'cats')       #=> false # { } isn't supported

(另外,尝试使用File::FNM_EXTGLOB给出了名称错误)

有没有办法在Ruby 1.9.3中使用大括号的glob,比如第三方gem?

我想要匹配的字符串来自S3,而不是本地文件系统,所以我不能只是要求操作系统根据我的知识进行通配。

2 个答案:

答案 0 :(得分:1)

我正在打包Ruby Backport用于括号支持。以下是该解决方案的基本部分:

module File::Constants
  FNM_EXTGLOB = 0x10
end

class << File
  def fnmatch_with_braces_glob(pattern, path, flags =0)
    regex = glob_convert(pattern, flags)

    return regex && path.match(regex).to_s == path
  end

  def fnmatch_with_braces_glob?(pattern, path, flags =0)
    return fnmatch_with_braces_glob(pattern, path, flags)
  end

private
  def glob_convert(pattern, flags)
    brace_exp = (flags & File::FNM_EXTGLOB) != 0
    pathnames = (flags & File::FNM_PATHNAME) != 0
    dot_match = (flags & File::FNM_DOTMATCH) != 0
    no_escape = (flags & File::FNM_NOESCAPE) != 0
    casefold = (flags & File::FNM_CASEFOLD) != 0
    syscase = (flags & File::FNM_SYSCASE) != 0
    special_chars = ".*?\\[\\]{},.+()|$^\\\\" + (pathnames ? "/" : "")
    special_chars_regex = Regexp.new("[#{special_chars}]")

    if pattern.length == 0 || !pattern.index(special_chars_regex)
      return Regexp.new(pattern, casefold || syscase ? Regexp::IGNORECASE : 0)
    end

    # Convert glob to regexp and escape regexp characters
    length = pattern.length
    start = 0
    brace_depth = 0
    new_pattern = ""
    char = "/"

    loop do
      path_start = !dot_match && char[-1] == "/"

      index = pattern.index(special_chars_regex, start)

      if index
        new_pattern += pattern[start...index] if index > start
        char = pattern[index]

        snippet = case char
        when "?"  then path_start ? (pathnames ? "[^./]" : "[^.]") : ( pathnames ? "[^/]" : ".")
        when "."  then "\\."
        when "{"  then (brace_exp && (brace_depth += 1) >= 1) ? "(?:" : "{"
        when "}"  then (brace_exp && (brace_depth -= 1) >= 0) ? ")" : "}"
        when ","  then (brace_exp && brace_depth >= 0) ? "|" : ","
        when "/"  then "/"
        when "\\"
          if !no_escape && index < length
            next_char = pattern[index += 1]
            special_chars.include?(next_char) ? "\\#{next_char}" : next_char
          else
            "\\\\"
          end
        when "*"
          if index+1 < length && pattern[index+1] == "*"
            char += "*"
            if pathnames && index+2 < length && pattern[index+2] == "/"
              char += "/"
              index += 2
              "(?:(?:#{path_start ? '[^.]' : ''}[^\/]*?\\#{File::SEPARATOR})(?:#{!dot_match ? '[^.]' : ''}[^\/]*?\\#{File::SEPARATOR})*?)?"
            else
              index += 1
              "(?:#{path_start ? '[^.]' : ''}(?:[^\\#{File::SEPARATOR}]*?\\#{File::SEPARATOR}?)*?)?"
            end
          else
            path_start ? (pathnames ? "(?:[^./][^/]*?)?" : "(?:[^.].*?)?") : (pathnames ? "[^/]*?" : ".*?")
          end
        when "["
          # Handle character set inclusion / exclusion
          start_index = index
          end_index = pattern.index(']', start_index+1)
          while end_index && pattern[end_index-1] == "\\"
            end_index = pattern.index(']', end_index+1)
          end
          if end_index
            index = end_index
            char_set = pattern[start_index..end_index]
            char_set.delete!('/') if pathnames
            char_set[1] = '^' if char_set[1] == '!'
            (char_set == "[]" || char_set == "[^]") ? "" : char_set
          else
            "\\["
          end
        else
          "\\#{char}"
        end

        new_pattern += snippet
      else
        if start < length
          snippet = pattern[start..-1]
          new_pattern += snippet
        end
      end

      break if !index
      start = index + 1
    end

    begin
      return Regexp.new("\\A#{new_pattern}\\z", casefold || syscase ? Regexp::IGNORECASE : 0)
    rescue
      return nil
    end
  end
end

此解决方案考虑了File::fnmatch函数可用的各种标志,并使用glob模式构建合适的Regexp以匹配这些功能。使用此解决方案,可以成功运行这些测试:

File.fnmatch('c{at,ub}s', 'cats', File::FNM_EXTGLOB)
#=> true
File.fnmatch('file{*.doc,*.pdf}', 'filename.doc')
#=> false
File.fnmatch('file{*.doc,*.pdf}', 'filename.doc', File::FNM_EXTGLOB)
#=> true
File.fnmatch('f*l?{[a-z].doc,[0-9].pdf}', 'filex.doc', File::FNM_EXTGLOB)
#=> true
File.fnmatch('**/.{pro,}f?l*', 'home/.profile', File::FNM_EXTGLOB | File::FNM_DOTMATCH)
#=> true

fnmatch_with_braces_glob(和?变体)将被修补以代替fnmatch,因此符合Ruby 2.0.0的代码也适用于早期的Ruby版本。为清楚起见,上面显示的代码不包括一些性能改进,参数检查或Backports功能检测和补丁代码;这些显然将包含在项目的实际提交中。

我仍在测试一些边缘情况并大力优化性能;它应该准备很快提交。一旦在官方的Backports版本中提供,我就会在此处更新状态。

请注意,Dir::glob支持也会同时出现。

答案 1 :(得分:0)

这是一个有趣的Ruby练习! 不知道这个解决方案是否足够强大,但这里有:

class File
  class << self
    def fnmatch_extglob(pattern, path, flags=0)
      explode_extglob(pattern).any?{|exploded_pattern|
        fnmatch(exploded_pattern,path,flags)
      }
    end

    def explode_extglob(pattern)
      if match=pattern.match(/\{([^{}]+)}/) then
        subpatterns = match[1].split(',',-1)
        subpatterns.map{|subpattern| explode_extglob(match.pre_match+subpattern+match.post_match)}.flatten
      else
        [pattern]
      end
    end
  end
end

需要更好的测试,但它似乎适用于简单的情况:

[2] pry(main)> File.explode_extglob('c{at,ub}s')
=> ["cats", "cubs"]
[3] pry(main)> File.explode_extglob('c{at,ub}{s,}')
=> ["cats", "cat", "cubs", "cub"]
[4] pry(main)> File.explode_extglob('{a,b,c}{d,e,f}{g,h,i}')
=> ["adg", "adh", "adi", "aeg", "aeh", "aei", "afg", "afh", "afi", "bdg", "bdh", "bdi", "beg", "beh", "bei", "bfg", "bfh", "bfi", "cdg", "cdh", "cdi", "ceg", "ceh", "cei", "cfg", "cfh", "cfi"]
[5] pry(main)> File.explode_extglob('{a,b}c*')
=> ["ac*", "bc*"]
[6] pry(main)> File.fnmatch('c{at,ub}s', 'cats')
=> false
[7] pry(main)> File.fnmatch_extglob('c{at,ub}s', 'cats')
=> true
[8] pry(main)> File.fnmatch_extglob('c{at,ub}s*', 'catsssss')
=> true

使用Ruby 1.9.3和Ruby 2.1.5和2.2.1进行测试。