Question

我决定今晚学习python :) 我非常了解C（在其中写了一个操作系统）所以我不是编程中的菜鸟所以python中的所有东西看起来都很简单，但我不知道如何解决这个问题：假设我有这个地址：

http://example.com/random/folder/path.html 现在我如何从中创建两个字符串，一个包含服务器的“基本”名称，因此在本例中它将是 http://example.com/ 另一个包含没有最后文件名的东西，所以在这个例子中它将是 http://example.com/random/folder/ 。我当然知道分别找到第3个和最后一个斜线的可能性，但也许你知道更好的方法：] 在两种情况下都有尾随斜线也很酷，但我不在乎，因为它可以很容易地添加。那么任何人都有一个好的，快速的，有效的解决方案吗？或者只有“我的”解决方案，找到斜杠？

谢谢！

Answer 1

python 2.x中的urlparse模块（或python 3.x中的urllib.parse）就是这样做的。

>>> from urllib.parse import urlparse
>>> url = 'http://example.com/random/folder/path.html'
>>> parse_object = urlparse(url)
>>> parse_object.netloc
'example.com'
>>> parse_object.path
'/random/folder/path.html'
>>> parse_object.scheme
'http'
>>>

如果您想在url下的文件路径上做更多工作，可以使用posixpath模块：

>>> from posixpath import basename, dirname
>>> basename(parse_object.path)
'path.html'
>>> dirname(parse_object.path)
'/random/folder'

之后，您可以使用posixpath.join将各个部分粘合在一起。

编辑：我完全忘记了windows用户会在os.path中的路径分隔符上窒息。我阅读了posixpath模块文档，它有一个特殊的URL操作参考，所以一切都很好。

Answer 2

我没有使用Python的经验，但我找到了urlparse module，它应该可以胜任。

Answer 3

如果这是您的URL解析范围，Python的内置rpartition将完成这项工作：

>>> URL = "http://example.com/random/folder/path.html"
>>> Segments = URL.rpartition('/')
>>> Segments[0]
'http://example.com/random/folder'
>>> Segments[2]
'path.html'

来自Pydoc，str.rpartition：

Splits the string at the last occurrence of sep, and returns a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself

这意味着rpartition会搜索你，并在你指定的字符的最后（最右边）出现时拆分字符串（在本例中为/）。它返回一个包含：

的元组

(everything to the left of char , the character itself , everything to the right of char)

Answer 4

在Python中，许多操作都是使用列表完成的。 Sebasian Dietz提到的urlparse模块可能很好地解决了你的具体问题，但是如果你通常对Pythonic的方法感兴趣，比如在字符串中找到斜杠，可以试试这样的事情：

url = 'http://example.com/random/folder/path.html'
# Create a list of each bit between slashes
slashparts = url.split('/')
# Now join back the first three sections 'http:', '' and 'example.com'
basename = '/'.join(slashparts[:3]) + '/'
# All except the last one
dirname = '/'.join(slashparts[:-1]) + '/'
print 'slashparts = %s' % slashparts
print 'basename = %s' % basename
print 'dirname = %s' % dirname

这个程序的输出是：

slashparts = ['http:', '', 'example.com', 'random', 'folder', 'path.html']
basename = http://example.com/
dirname = http://example.com/random/folder/

有趣的位是split，join，切片符号数组[A：B]（包括从末尾开始的负数），作为奖励，{{1在字符串上运算符以提供printf样式的格式。

Answer 5

非常感谢这里的其他回答者，他们通过他们给出的答案指出了我正确的方向！

似乎sykora的答案提到的posixpath模块在我的Python设置（python 2.7.3）中不可用。

根据this article，似乎“正确”的方式是使用......

urlparse.urlparse和urlparse.urlunparse可用于分离和重新附加网址
os.path的功能可用于操纵路径
urllib.url2pathname和urllib.pathname2url（使路径名称操作可移植，因此可以在Windows等上运行）

例如（不包括重新附加基本URL）...

>>> import urlparse, urllib, os.path
>>> os.path.dirname(urllib.url2pathname(urlparse.urlparse("http://example.com/random/folder/path.html").path))
'/random/folder'

Answer 6

您可以使用python的库furl：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication29
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.ConformanceLevel = ConformanceLevel.Fragment;

            XmlReader reader = XmlTextReader.Create(FILENAME,settings);
            List<object> results = new List<object>();
            while (!reader.EOF)
            {
                if (reader.Name != "row")
                {
                    reader.ReadToFollowing("row");
                }
                if (!reader.EOF)
                {
                    XElement row = (XElement)XElement.ReadFrom(reader);
                    results.Add(new object[] {
                        row.Elements("column").Select(y => new {
                            rowNum = (int)row.Attribute("rownum"),
                            colNum = (int)y.Attribute("colnum"),
                            colName = (string)y.Attribute("name"),
                            value = (string)y
                        }).FirstOrDefault()
                    });

                }
            }
        }

    }
}

要在第一个＆＃34; /＆＃34;之后访问单词，请使用：

f = furl.furl("http://example.com/random/folder/path.html")
print(str(f.path))  # '/random/folder/path.html'
print(str(f.path).split("/")) # ['', 'random', 'folder', 'path.html']

如何在Python中将url字符串拆分为单独的部分？

6 个答案: