我正在用Python编写一个个人wiki风格的程序,用于将文本文件存储在用户可配置的目录中。
程序应该能够从用户处获取字符串(例如foo
)并创建foo.txt
的文件名。用户只能在wiki目录中创建文件,斜杠将创建一个子目录(例如foo/bar
变为(path-to-wiki)/foo/bar.txt
)。
检查输入是否尽可能安全的最佳方法是什么?我需要注意什么?我知道一些常见的陷阱是:
../
\0
我意识到获取文件名的用户输入绝不是100%安全,但程序只能在本地运行,我只想防范任何常见的错误/故障。
答案 0 :(得分:8)
您可以强制用户在wiki中创建文件/目录,方法是使用os.path.normpath规范路径,然后检查路径是否以'(path to to-wiki)'开头
os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')
要确保用户输入的路径/文件名不包含任何令人讨厌的内容,您可以强制用户输入路径或文件名到下/上Alpha,数字数字或者可以是连字符或下划线。
然后,您始终可以使用类似的正则表达式
检查规范化的文件名userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt')
re.findall(r'[^A-Za-z0-9_\-\\]',userpath)
总结
如果userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt')
则
if not os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')
or re.search(r'[^A-Za-z0-9_\-\\]',userpath):
... Do what ever you want with an invalid path
答案 1 :(得分:5)
Armin Ronacher有关于此主题(和其他)的博客文章: http://lucumr.pocoo.org/2010/12/24/common-mistakes-as-web-developer/
这些想法是作为Flask中的safe_join()函数实现的:
def safe_join(directory, filename):
"""Safely join `directory` and `filename`.
Example usage::
@app.route('/wiki/<path:filename>')
def wiki_page(filename):
filename = safe_join(app.config['WIKI_FOLDER'], filename)
with open(filename, 'rb') as fd:
content = fd.read() # Read and process the file content...
:param directory: the base directory.
:param filename: the untrusted filename relative to that directory.
:raises: :class:`~werkzeug.exceptions.NotFound` if the resulting path
would fall out of `directory`.
"""
filename = posixpath.normpath(filename)
for sep in _os_alt_seps:
if sep in filename:
raise NotFound()
if os.path.isabs(filename) or filename.startswith('../'):
raise NotFound()
return os.path.join(directory, filename)
答案 2 :(得分:1)
现在有一个完整的库,可以验证字符串: check it out:
from pathvalidate import sanitize_filepath
fpath = "fi:l*e/p\"a?t>h|.t<xt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))
fpath = "\0_a*b:c<d>e%f/(g)h+i_0.txt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))
输出:
fi:l*e/p"a?t>h|.t<xt -> file/path.txt
_a*b:c<d>e%f/(g)h+i_0.txt -> _abcde%f/(g)h+i_0.txt
答案 3 :(得分:0)
你可以只验证所有字符都是可打印的字母数字ascii,除了'','。'和'/'字符,然后删除所有坏组合的实例...
safe_string = str()
for c in user_supplied_string:
if c.isalnum() or c in [' ','.','/']:
safe_string = safe_string + c
while safe_string.count("../"):
# I use a loop because only replacing once would
# leave a hole in that a bad guy could enter ".../"
# which would be replaced to "../" so the loop
# prevents tricks like this!
safe_string = safe_string.replace("../","./")
# Get rid of leading "./" combinations...
safe_string = safe_string.lstrip("./")
这就是我要做的,我不知道它是多么pythonic,但它应该让你非常安全。如果你想验证而不是转换那么你就可以在那之后做一个相等的测试:
valid = save_string == user_supplied_string
if not valid:
raise Exception("Sorry the string %s contains invalid characters" % user_supplied_string )
最后两种方法都可能有效,我发现这种方法更加明确,并且还应该筛选出任何奇怪/不合适的字符,例如'\ t','\ r'或'\ n' 干杯!