Question

我是python的新手并试图解决这个问题，很抱歉，如果有人问过这个问题。我找不到它，也不知道这叫什么。

所以缺少它。我想采取像这样的链接：

http://www.somedomainhere.com/embed-somekeyhere-650x370.html

并将其转换为：

http://www.somedomainhere.com/somekeyhere

很长一段时间，我一直在为xbmc添加一个插件去网站，抓取一个网址，转到该网址找到另一个网址。基本上是一个网址解析器。

所以该程序搜索该网站并提出了一些关键字-650x370.html。但该页面是在java中，对我来说无法使用。但是当我去com / somekeyhere时，代码是可用的。所以我需要获取第一个url，将url更改为可用页面，然后抓取该页面。

到目前为止，我的代码是

if 'somename' in name:
try:
  n=re.compile('<iframe title="somename" type="text/html" frameborder="0" scrolling="no" width=".+?" height=".+?" src="(.+?)">" frameborder="0"',re.DOTALL).findall(net().http_GET(url).content)[0]
CONVERT URL to .com/somekeyhere SO BELOW NA CAN READ IT.
  na = re.compile("'file=(.+?)&.+?'",re.DOTALL).findall(net().http_GET(na).content)[0]

有关如何完成转换网址的任何建议？

Answer 1

我真的没有解决你的问题。但是，回答简短的

假设： somekey是一个字母数字

a='http://www.domain.com/embed-somekey-650x370.html'
p=re.match(r'^http://www.domain.com/embed-(?P<key>[0-9A-Za-z]+)-650x370.html$',a)
somekey=p.group('key')
requiredString="http://www.domain.com/"+somekey #comment1

我真的只为域名提供了一个非常具体的答案。您应该根据需要修改正则表达式。我看到你的代码使用正则表达式，因此我假设您可以构建正则表达式以更好地满足您的要求。

编辑1：从这里也看到urlparse https://docs.python.org/2/library/urlparse.html?highlight=urlparse#module-urlparse

它提供了一种解析网址的简便方法

此外，符合＆＃34;＃comment1＆＃34;您实际上可以将域名保存到变量并在此处重复使用

在python中剪切并重新提交url

1 个答案: