Question

我正在用Python编写一个个人wiki风格的程序，用于将文本文件存储在用户可配置的目录中。

程序应该能够从用户处获取字符串（例如foo）并创建foo.txt的文件名。用户只能在wiki目录中创建文件，斜杠将创建一个子目录（例如foo/bar变为(path-to-wiki)/foo/bar.txt）。

检查输入是否尽可能安全的最佳方法是什么？我需要注意什么？我知道一些常见的陷阱是：

目录遍历：../
空字节：\0

我意识到获取文件名的用户输入绝不是100％安全，但程序只能在本地运行，我只想防范任何常见的错误/故障。

Answer 1

您可以强制用户在wiki中创建文件/目录，方法是使用os.path.normpath规范路径，然后检查路径是否以'（path to to-wiki）'开头

os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')

要确保用户输入的路径/文件名不包含任何令人讨厌的内容，您可以强制用户输入路径或文件名到下/上Alpha，数字数字或者可以是连字符或下划线。

然后，您始终可以使用类似的正则表达式

检查规范化的文件名

userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt')
re.findall(r'[^A-Za-z0-9_\-\\]',userpath)

总结

如果userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt')则

if not os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')  
   or re.search(r'[^A-Za-z0-9_\-\\]',userpath):
  ... Do what ever you want with an invalid path

Answer 2

Armin Ronacher有关于此主题（和其他）的博客文章： http://lucumr.pocoo.org/2010/12/24/common-mistakes-as-web-developer/

这些想法是作为Flask中的safe_join()函数实现的：

def safe_join(directory, filename):
    """Safely join `directory` and `filename`.

    Example usage::

    @app.route('/wiki/<path:filename>')
    def wiki_page(filename):
    filename = safe_join(app.config['WIKI_FOLDER'], filename)
    with open(filename, 'rb') as fd:
    content = fd.read() # Read and process the file content...

    :param directory: the base directory.
    :param filename: the untrusted filename relative to that directory.
    :raises: :class:`~werkzeug.exceptions.NotFound` if the resulting path
    would fall out of `directory`.
    """
    filename = posixpath.normpath(filename)
    for sep in _os_alt_seps:
        if sep in filename:
            raise NotFound()
    if os.path.isabs(filename) or filename.startswith('../'):
        raise NotFound()
    return os.path.join(directory, filename)

Answer 3

现在有一个完整的库，可以验证字符串： check it out:

from pathvalidate import sanitize_filepath

fpath = "fi:l*e/p\"a?t>h|.t<xt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))

fpath = "\0_a*b:c<d>e%f/(g)h+i_0.txt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))

输出：

fi:l*e/p"a?t>h|.t<xt -> file/path.txt
_a*b:c<d>e%f/(g)h+i_0.txt -> _abcde%f/(g)h+i_0.txt

Answer 4

你可以只验证所有字符都是可打印的字母数字ascii，除了''，'。'和'/'字符，然后删除所有坏组合的实例...

safe_string = str()
for c in user_supplied_string:
    if c.isalnum() or c in [' ','.','/']:
        safe_string = safe_string + c

while safe_string.count("../"):
    # I use a loop because only replacing once would 
    # leave a hole in that a bad guy could enter ".../"
    # which would be replaced to "../" so the loop 
    # prevents tricks like this!
    safe_string = safe_string.replace("../","./")
# Get rid of leading "./" combinations...
safe_string = safe_string.lstrip("./")

这就是我要做的，我不知道它是多么pythonic，但它应该让你非常安全。如果你想验证而不是转换那么你就可以在那之后做一个相等的测试：

valid = save_string == user_supplied_string
if not valid:
     raise Exception("Sorry the string %s contains invalid characters" % user_supplied_string )

最后两种方法都可能有效，我发现这种方法更加明确，并且还应该筛选出任何奇怪/不合适的字符，例如'\ t'，'\ r'或'\ n' 干杯!

在python中验证文件名

4 个答案: