How to get a specific text slice in string

时间:2018-09-18 20:26:23

标签: python

I'm trying to work with Instagram

So, say I have a link https://www.instagram.com/p/Bn4Lmo_j0Jc/

And I want to get a Bn4Lmo_j0jc only. I could just remove everthing before this ID and the last /

But what if my link looks like this: https://www.instagram.com/p/Bn4Lmo_j0Jc/?taken-by=instagram or this https://www.instagram.com/p/Bn1GpYyBFSl/?hl=en&taken-by=zaralarsson so there is no exact number of characters I need to remove. What will be the easiest way to solve this?

5 个答案:

答案 0 :(得分:2)

how about this?

import urllib

url = 'https://www.instagram.com/p/Bn4Lmo_j0Jc/'

parts = urllib.parse.urlparse(url)

parts.path
'/p/Bn4Lmo_j0Jc/'

答案 1 :(得分:0)

lst = link.split("/")
lst[-1] if not lst[-1].startswith("?") and lst[-1] else lst[-2]

其中link是您的链接字符串。

(结果是lst中的最后一个元素,如果它不是以?开头并且不为空-否则结果是最后一个元素,而是一个元素)

答案 2 :(得分:0)

一致格式

鉴于您将始终拥有URL https://instagram.com/p/,因此您所需要的只是使用字符串解释器。

base_url = 'https://instagram.com/p/' 
main = 'https://www.instagram.com/p/Bn4Lmo_j0Jc/?taken-by=instagram'
# remove your base url
# split on separator '/'
# select the ID in index [0]
main.replace(base_url,'').split('/')[0]
'Bn4Lmo_j0Jc'

用于循环

如果您有要提取和捕获的URL列表:

url_base = 'https://instagram.com/p/' 
url_list = [url1,url2,url3]
id_list = []

for url in url_list:   
   id_list.append(url.replace(url_base,'').split('/')[0])

答案 3 :(得分:0)

var fruits, text, fLen, i;

fruits = ["Banana", "Orange", "Apple", "Mango"];
fLen = fruits.length;
text = "<ul>";
for (i = 0; i < fLen; i++) {
 text += "<li>" + fruits[i] + "</li>";
}

输出:

from urllib import parse
def getId(url):
    return parse.urlparse(url).path[3:-1]

print(getId('https://www.instagram.com/p/Bn1GpYyBFSl/?hl=en&taken-by=zaralarsson'))
print(getId('https://www.instagram.com/p/Bn4Lmo_j0Jc/'))
print(getId('https://www.instagram.com/p/Bn4Lmo_j0Jc/?taken-by=instagram'))

答案 4 :(得分:0)

您可以在此处使用正则表达式。如果您的网址在您所关注的ID字段后有多个/ p /,那么它也可以处理

import re
a=['https://www.instagram.com/p/Bn1GpYyBFSl/?hl=en&taken-by=zaralarsson',
'https://www.instagram.com/p/Bn4Lmo_j0Jc/',
'https://www.instagram.com/p/Bn4Lmo_j0Jc/?taken-by=instagram/p/12321']
[re.findall('/p/(\w{1,})',i)[0] for i in a]