我有以下这段代码。
def profile_details(): #function to fetch people
payload = 'grab'
global result_people
result_people = []
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass
profile_details()
for i in result_people:
print(i)
输出看起来像这样
[b'\n Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n ', b'\n Coding at Amazon, previously @Grab\n', b'\n Software Engineer @grab \r\nPreviously @shopback \n ', b'\n Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n ', b'\n Coding at Amazon, previously @Grab\n', b'\n Software Engineer @grab \r\nPreviously @shopback \n ', b'\n UX Engineer @ Grab\n', b'\n Designer at @Grab. Design Systems. Emerging tech (AR).\n ', b'\n Mobile Developer (iOS) @Grab. Previously Flipkart.\n ', b'\n Data science and engineering at Grab\n', b'\n Software Engineer @ Grab.\n ', b"\n Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singapore\n ", b'\n Frontend Software Engineer at Grab\n', b'\n Developer @Grab(GrabTaxi)\n ', b'\n Full Stack - Software Engineer @ Grab | AI Enthusiast\n ', b'\n Software Engineer at Grab\n', b'\n Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swift\n ', b'\n Ex-Engineering Lead @grab, Ex-DoE @90seconds\n ', b'\n Software Engineer/ Gopher. Worked @grab, @microsoft\n ']
我想从列表中删除\ xf0 \ x9f \ x8c \ x9d \之类的字符。
输出似乎一团糟:
b'\n Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n '
b'\ n在亚马逊编码,以前为@Grab \ n' b'\ n软件工程师@grab \ r \ n以前@shopback \ n' b'\ n前端@facebook \ xf0 \ x9f \ x8c \ x9d \ xc2 \ xb7维护Docusaurus \ xc2 \ xb7 Ex- @ grab \ xf0 \ x9f \ x87 \ xb8 \ xf0 \ x9f \ x87 \ xac \ r \ n \ n' b'\ n在亚马逊编码,以前为@Grab \ n' b'\ n软件工程师@grab \ r \ n以前@shopback \ n'
实现这一目标的最简单便捷的方法是什么。
预先感谢
答案 0 :(得分:0)
欢迎使用StackOverflow!
您可以通过从每个字符串中删除所有非ASCII字符来实现
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
答案 1 :(得分:0)
通过忽略时以ascii编码的错误来解决它,方法是使用ignore作为参数,然后将其转换回utf-8。
result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))