Question

我无法控制的子系统坚持以uri的形式提供文件系统路径。是否有一个python模块/函数可以将此路径转换为文件系统所期望的适当形式，以独立于平台的方式进行？

Answer 1

urlparse模块提供URI的路径：

import os, urlparse
p = urlparse.urlparse('file://C:/test/doc.txt')
finalPath = os.path.abspath(os.path.join(p.netloc, p.path))

Answer 2

对于未来的读者。来自@Jakob Bowyer的解决方案不会将URL字符转换为ascii。经过一番挖掘后，我发现了这个解决方案：

>>> import urllib, urlparse
>>> urllib.url2pathname(urlparse.urlparse('file:///home/user/some%20file.txt').path)
'/home/user/some file.txt'

编辑：

以下是我最终使用的内容：

>>> import urllib
>>> urllib.unquote('file:///home/user/some%20file.txt')[7:]
'/home/user/some file.txt'

Answer 3

要将文件uri转换为使用python的路径（特定于3，如果有人真的想要的话，我可以使用python 2）：

使用urllib.parse.urlparse
用urllib.parse.unquote
然后...

a。如果path是Windows路径，并且以/开头：请删除未加引号的路径部分的第一个字符（file:///C:/some/file.txt的路径部分为/C:/some/file.txt，这不解释为等同于{{1} }由C:\some\file.txt）

b。否则，请按原样使用未引用的路径组件。

以下是执行此操作的功能：

pathlib.PureWindowsPath

用法示例（在Linux上运行）：

import urllib
import pathlib

def file_uri_to_path(file_uri, path_class=pathlib.PurePath):
    """
    This function returns a pathlib.PurePath object for the supplied file URI.

    :param str file_uri: The file URI ...
    :param class path_class: The type of path in the file_uri. By default it uses
        the system specific path pathlib.PurePath, to force a specific type of path
        pass pathlib.PureWindowsPath or pathlib.PurePosixPath
    :returns: the pathlib.PurePath object
    :rtype: pathlib.PurePath
    """
    windows_path = isinstance(path_class(),pathlib.PureWindowsPath)
    file_uri_parsed = urllib.parse.urlparse(file_uri)
    file_uri_path_unquoted = urllib.parse.unquote(file_uri_parsed.path)
    if windows_path and file_uri_path_unquoted.startswith("/"):
        result = path_class(file_uri_path_unquoted[1:])
    else:
        result = path_class(file_uri_path_unquoted)
    if result.is_absolute() == False:
        raise ValueError("Invalid file uri {} : resulting path {} not absolute".format(
            file_uri, result))
    return result

此功能适用于Windows和posix文件URI，它将处理没有权限部分的文件URI。但是，它不会对URI的权限进行验证，因此不会兑现：

IETF RFC 8089: The "file" URI Scheme / 2. Syntax

“主机”是系统上的全限定域名该文件是可访问的。这允许另一个系统上的客户端执行以下操作：知道它无法访问文件系统，或者可能需要使用其他本地机制来访问文件。

该功能的验证（pytest）：

>>> file_uri_to_path("file:///etc/hosts")
PurePosixPath('/etc/hosts')

>>> file_uri_to_path("file:///etc/hosts", pathlib.PurePosixPath)
PurePosixPath('/etc/hosts')

>>> file_uri_to_path("file:///C:/Program Files/Steam/", pathlib.PureWindowsPath)
PureWindowsPath('C:/Program Files/Steam')

>>> file_uri_to_path("file:/proc/cpuinfo", pathlib.PurePosixPath)
PurePosixPath('/proc/cpuinfo')

>>> file_uri_to_path("file:c:/system32/etc/hosts", pathlib.PureWindowsPath)
PureWindowsPath('c:/system32/etc/hosts')

此捐款（除可能适用的任何其他许可证之外）还根据Zero-Clause BSD License (0BSD)许可证获得许可

为任何目的使用，复制，修改和/或分发此软件的许可特此授予带有或不带有费用的目的。

该软件按“原样”提供，作者否认所有担保关于本软件，包括以下所有隐含的保证：适销性和健身性。在任何情况下，作者均不承担任何责任任何特殊，直接，间接或后果性损害或任何损害无论是由于使用，数据或利润损失而导致的任何后果合同，疏忽或其他侵权行为引起的或与此软件的使用或性能有关。

在法律允许的范围内，Iwan Aucamp放弃了此stackexchange贡献的所有版权以及相关或邻近的权利。该作品发表于：挪威。

Answer 4

到目前为止，在所有答案中，我发现没有一个能抓住边缘情况，不需要分支，并且都是2/3兼容的，和跨平台。

简而言之，这仅使用内置函数即可完成工作：

try:
    from urllib.parse import urlparse, unquote
    from urllib.request import url2pathname
except ImportError:
    # backwards compatability
    from urlparse import urlparse
    from urllib import unquote, url2pathname


def uri_to_path(uri):
    parsed = urlparse(uri)
    host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc)
    return os.path.normpath(
        os.path.join(host, url2pathname(unquote(parsed.path)))
    )

棘手的一点（我发现）是在Windows中使用指定主机的路径进行的。这不是Windows之外的问题：* NIX中的网络位置只能通过安装到文件系统根目录后的路径到达。

来自Wikipedia：文件URI的格式为file://host/path，其中host是可在其上访问路径的系统的标准域名。[...]如果省略host，则将其视为“ localhost”。

考虑到这一点，我将规则始终用netloc提供的urlparse前缀，然后再传递给os.path.abspath，这是必要的< / strong>，因为它删除了所有由此产生的多余斜杠（os.path.normpath，也声称可以解决这些斜杠，在Windows中可能会显得有些过分热情，因此使用{{1} }。

转换中的另一个重要组成部分是使用abspath来转义/解码URL百分比编码，否则文件系统将无法理解。同样，这在Windows上可能是一个更大的问题，它允许路径中的unquote和空格之类的东西已经被编码在文件URI中。

演示：

$

结果（WINDOWS）：

import os from pathlib import Path # This demo requires pip install for Python < 3.4 import sys try: from urllib.parse import urlparse, unquote from urllib.request import url2pathname except ImportError: # backwards compatability: from urlparse import urlparse from urllib import unquote, url2pathname DIVIDER = "-" * 30 if sys.platform == "win32": # WINDOWS filepaths = [ r"C:\Python27\Scripts\pip.exe", r"C:\yikes\paths with spaces.txt", r"\\localhost\c$\WINDOWS\clock.avi", r"\\networkstorage\homes\rdekleer", ] else: # *NIX filepaths = [ os.path.expanduser("~/.profile"), "/usr/share/python3/py3versions.py", ] for path in filepaths: uri = Path(path).as_uri() parsed = urlparse(uri) host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc) normpath = os.path.normpath( os.path.join(host, url2pathname(unquote(parsed.path))) ) absolutized = os.path.abspath( os.path.join(host, url2pathname(unquote(parsed.path))) ) result = ("{DIVIDER}" "\norig path: \t{path}" "\nconverted to URI:\t{uri}" "\nrebuilt normpath:\t{normpath}" "\nrebuilt abspath:\t{absolutized}").format(**locals()) print(result) assert path == absolutized

结果（* NIX）：

------------------------------ orig path: C:\Python27\Scripts\pip.exe converted to URI: file:///C:/Python27/Scripts/pip.exe rebuilt normpath: C:\Python27\Scripts\pip.exe rebuilt abspath: C:\Python27\Scripts\pip.exe ------------------------------ orig path: C:\yikes\paths with spaces.txt converted to URI: file:///C:/yikes/paths%20with%20spaces.txt rebuilt normpath: C:\yikes\paths with spaces.txt rebuilt abspath: C:\yikes\paths with spaces.txt ------------------------------ orig path: \\localhost\c$\WINDOWS\clock.avi converted to URI: file://localhost/c%24/WINDOWS/clock.avi rebuilt normpath: \localhost\c$\WINDOWS\clock.avi rebuilt abspath: \\localhost\c$\WINDOWS\clock.avi ------------------------------ orig path: \\networkstorage\homes\rdekleer converted to URI: file://networkstorage/homes/rdekleer rebuilt normpath: \networkstorage\homes\rdekleer rebuilt abspath: \\networkstorage\homes\rdekleer

Answer 5

@ colton7909的解决方案大部分是正确的，可以帮助我获得此答案，但是在Python 3中存在一些导入错误。我认为这是处理URL 'file://'部分的更好方法而不是简单地砍掉前7个字符。因此，我觉得这是使用标准库执行此操作的最惯用的方法：

import urllib.parse
url_data = urllib.parse.urlparse('file:///home/user/some%20file.txt')
path = urllib.parse.unquote(url_data.path)

此示例应产生字符串'/home/user/some file.txt'

有没有方便的方法将文件uri映射到os.path？

5 个答案: