我正在尝试删除数组中项目的前7个字符,更具体地说,我试图删除“mailto”,以便它只显示电子邮件
我认为使用[:7]可以解决这个问题,但python忽略了请求。
有什么建议吗?
def businessprofile(self, response):
for business in response.css('header#main-header'):
item = Item()
item['business_name'] = business.css('div.sales-info h1::text').extract()
item['website'] = business.css('a.secondary-btn.website-link::attr(href)').extract()
# i want to remove the first 7 characters "mailto:", but not sure how ? i made an attempt
item['email'] = business.css('a.email-business::attr(href)').extract()[7:]
item['phonenumber'] = business.css('p.phone::text').extract_first()
for x in item['business_name']:
#new code here, call to self.seen_business_names
if x not in self.seen_business_names:
if item['business_name']:
if item['phonenumber']:
if item['email']:
yield item
self.seen_business_names.append(x)
这是我需要删除字符的地方
item['email'] = business.css('a.email-business::attr(href)').extract()[7:]
答案 0 :(得分:1)
显然business.css('a.email-business::attr(href)').extract()
会返回一个列表。您需要从列表中的项目中删除mailto:
。
s = business.css('a.email-business::attr(href)').extract()
item['email'] = [item[7:] for item in s]
# ['businessname@gmail.com']
或者
s = business.css('a.email-business::attr(href)').extract()
item['email'] = [item.replace('mailto:', '') for item in s]
# ['businessname@gmail.com']
答案 1 :(得分:0)
您需要使用[7:]
而不是[:7]
语法为[<start>:<end>]
,省略时会自动从字符串的开头或结尾开始。
例如:
val = "mailto:abc@abc.de"
mailto = val[:7] # from first charater to 7th = 'mailto:'
email = val[7:] # 8th character to the end.
答案 2 :(得分:0)
计数从0开始:
a = "0123456789"
a[7:]
# '789'
所以你可能需要
a[8:]
# '89'