！ / usr / bin / env python

Question

我有以下python代码

from urlparse import urlparse

def clean_url(url):
    new_url = urlparse(url)
    if new_url.netloc == '':
        return new_url.path.strip().decode()
    else:
        return new_url.netloc.strip().decode()

print clean_url("http://www.facebook.com/john.doe")
print clean_url("http://facebook.com/john.doe")
print clean_url("facebook.com/john.doe")
print clean_url("www.facebook.com/john.doe")
print clean_url("john.doe")

在每个例子中，我接受一个字符串并返回它。这不是我想要的。我试图采用每个例子并且总是返回“http://www.facebook.com/john.doe”，即使他们只输入www。*或只是john.doe。

我对编程很新，所以请保持温和。

Answer 1

我知道这个答案对派对来说有点晚了，但如果这正是你想要做的，我建议采用略有不同的方法。不要重新规划Facebook网址的规范化，而应考虑使用Google已经完成的工作与社交图谱API一起使用。

他们已经为许多类似网站实施了模式，包括facebook。有关这方面的更多信息，请访问：

http://code.google.com/p/google-sgnodemapper/

Answer 2

import urlparse
p = urlparse.urlsplit("john.doe")


=> ('','','john.doe','','')

元组的第一个元素应该是“http：//”，元组的第二个元素应该是“www.facebook.com/”，你可以单独留下元组的第四个和第五个元素。然后，您可以在处理完URL后重新组合它。

只是一个FYI，为'john.doe'确保一个安全的网址段（这可能不适用于Facebook，但它是一个很好的规则要知道）使用urllib.quote（字符串）来正确地逃避空格等。< / p>

Answer 3

我不太确定我是否明白你的要求，但你可以试试这段代码，我测试过并且运行正常但是如果你遇到麻烦，请告诉我。

我希望它有所帮助

！ / usr / bin / env python

import urlparse

def clean_url（url）：

url_list = [] 
# split values into tuple
url_tuple = urlparse.urlsplit(url)

# as tuples are immutable so take this to a list
# so we can change the values that we need
counter = 0
for element in url_tuple:
    url_list.append(element)

# validate each element individually
url_list[0] = 'http'
url_list[1] = 'www.facebook.com'

# get user name from the original url
# ** I understood the user is the only value
# for sure in the url, right??
user = url.split('/')
if len(user) == 1:
    # the user was the only value sent
    url_list[2] = user[0]
else:
    # get the last element of the list
    url_list[2] = user[len(user)-1]

# convert the list into a tuple and
# get all the elements together in the url again
new_url = urlparse.urlunsplit(tuple(url_list))       

return new_url

如果名称 =='主要'： print clean_url（“http://www.facebook.com/john.doe”） print clean_url（“http://facebook.com/john.doe”） print clean_url（“facebook.com/john.doe”） print clean_url（“www.facebook.com/john.doe”） print clean_url（“john.doe”）

无论用户输入什么，始终返回正确的URL？

3 个答案:

我希望它有所帮助

！ / usr / bin / env python