如何从utf-8 LIST

时间:2018-11-13 01:54:29

标签: python-3.x

我有以下这段代码。

def profile_details():  #function to fetch people
    payload = 'grab'
    global result_people 
    result_people = []
    for i in range(0,5):
        git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
        rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
        page =  bs4.BeautifulSoup(rr.text,"lxml")
        page_parse = page.select('.user-list-info p')
        for i in range(len(page_parse)): 
                test = page_parse[i].text
                if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test)  or ('@grab' in test):
                        a = result_people.append(page_parse[i].text.encode("utf-8"))
                else:
                        pass

profile_details()
for i in result_people:
        print(i)

输出看起来像这样

[b'\n          Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n        ', b'\n          Coding at Amazon, previously @Grab\n', b'\n          Software Engineer @grab \r\nPreviously @shopback \n        ', b'\n          Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n        ', b'\n          Coding at Amazon, previously @Grab\n', b'\n          Software Engineer @grab \r\nPreviously @shopback \n        ', b'\n          UX Engineer @ Grab\n', b'\n          Designer at @Grab. Design Systems. Emerging tech (AR).\n        ', b'\n          Mobile Developer (iOS) @Grab. Previously Flipkart.\n        ', b'\n          Data science and engineering at Grab\n', b'\n          Software Engineer @ Grab.\n        ', b"\n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singapore\n        ", b'\n          Frontend Software Engineer at Grab\n', b'\n          Developer @Grab(GrabTaxi)\n        ', b'\n          Full Stack - Software Engineer @ Grab | AI Enthusiast\n        ', b'\n          Software Engineer at Grab\n', b'\n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swift\n        ', b'\n          Ex-Engineering Lead @grab, Ex-DoE @90seconds\n        ', b'\n          Software Engineer/ Gopher. Worked @grab, @microsoft\n        ']

我想从列表中删除\ xf0 \ x9f \ x8c \ x9d \之类的字符。

输出似乎一团糟:

b'\n          Front End @facebook \xf0\x9f\x8c\x9d \xc2\xb7 Maintaining Docusaurus \xc2\xb7 Ex-@grab \xf0\x9f\x87\xb8\xf0\x9f\x87\xac\r\n\n        '

b'\ n在亚马逊编码,以前为@Grab \ n' b'\ n软件工程师@grab \ r \ n以前@shopback \ n' b'\ n前端@facebook \ xf0 \ x9f \ x8c \ x9d \ xc2 \ xb7维护Docusaurus \ xc2 \ xb7 Ex- @ grab \ xf0 \ x9f \ x87 \ xb8 \ xf0 \ x9f \ x87 \ xac \ r \ n \ n' b'\ n在亚马逊编码,以前为@Grab \ n' b'\ n软件工程师@grab \ r \ n以前@shopback \ n'

实现这一目标的最简单便捷的方法是什么。

预先感谢

2 个答案:

答案 0 :(得分:0)

欢迎使用StackOverflow!

您可以通过从每个字符串中删除所有非ASCII字符来实现

for i in result_people:
    print(i.decode('utf8').encode('ascii', errors='ignore'))

答案 1 :(得分:0)

通过忽略时以ascii编码的错误来解决它,方法是使用ignore作为参数,然后将其转换回utf-8。

result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))