无法摆脱所有的表情符号

时间:2018-05-05 18:37:08

标签: python-3.x

我需要帮助删除表情符号。我看了一些其他的stackoverflow问题,这就是我的意思,但由于某种原因,我的代码并没有摆脱所有的emojis

d= {'alexveachfashion': 'Fashion Style * Haute Couture * Wearable Tech * VR\n⌚\nSoundCloud is Live @alexveach\nNew YouTube Episodes ▶️', 'andrewvng': 'Family | Fitness | Friends | Gym | Food', 'runvi.official': 'Accurate measurement via SMART insoles & real-time AI coaching. Improve your technique & BOOST your performance with every run.\nSoon on Kickstarter!', 'triing': 'Augmented Jewellery™️ • Montreal. Canada.', 'gedeanekenshima': 'Prof na Etec Albert Einstein, Mestranda em Automação e Controle de Processos, Engenheira de Controle e Automação, Técnica em Automação Industrial.', 'jetyourdaddy': '', 'lavonne_sun': '☄️ ✨\n°●°。Visual Narrative\nA creative heart with a poetic soul.\n————————————\nPARSONS —— Design & Technology', 'taysearch': 'All the World’s Information At Your Fingertips. (Literally) Est. 1991  #PrincessofSearch Sample  the Search Engine Here ', 'hijewellery': 'Fine 3D printed jewellery for tech lovers #3dprintedjewelry #wearabletech #jewellery', 'yhanchristian': 'Estudante de Engenharia, Maker e viciado em café.', 'femka': 'Fashion Futurist + Fashion Tech Lab Founder @technoirlab + Fashion Designer / Parsons & CSM Grad / Obsessed with #fashiontech #future #cryptocurrency', 'sinhbisen': 'Creator, TRiiNG, augmented jewellery label ⭕️ Transhumanist ⭕️ Corporeal cartographer ⭕️', 'stellawearables': '#StellaWearables ✉️Info@StellaWearables.com                  Premium Wearable Technology That Monitors Personal Health & Environments ☀️', 'ivoomi_india': 'We are the manufacturers of the most innovative technologies and user-friendly gadgets with a global presence.', 'bgutenschwager': "When it comes to life, it's all about the experience.\nGoogle Mapper \n360 Photographer \nBrand Rep @QuickTutor", 'storiesofdesign': 'Putting stories at the heart of brands and businesses | Cornwall and London  |  #storiesofdesign', 'trume.jp': '草創期から国産ウオッチの製造に取り組み、挑戦を続けてきたエプソンが世界に放つ新ブランド「TRUME」(トゥルーム)。目指すのは、最先端技術でアナログウオッチを極めるブランド。', 'themarinesss': "I didn't choose the blog life, the blog life chose me | Aspiring Children's Book Author | www.slayathomemum.com", 'ayowearable': 'The world’s first light-based wearable that helps you sleep better, beat jet lag and have more energy! #goAYO Get yours at:', 'wearyourowntechs': 'Bringing you the latest trends, Current Products and Reviews of Wearable Technology. Discover how they can enhance your Life and Lifestyle', 'roxfordwatches': 'The Roxford  |  The most stylish and customizable fitness smartwatch. Tracks your steps/calories/dist/sleep. Comes with FOUR bands, and a travel case!', 'playertek': "Track your entire performance - every training session, every match. \nBecause the best players don't hide.", '_kate_hartman_': '', 'hmsmc10': 'Health & Wellness \nBoston, MA \nSuffolk MPA ‘17  \n.\nJust Strong Ambassador \u200d♀️', 'gadgetxtreme': 'Dedicated to reviewing gadgets, technologies, internet products and breaking tech news. Follow us to see daily vblogs on all the disruptive tech..', 'freedom.journey.leader': 'MN\nWife • Homeschooling Mom to 5  • D Y I lover  • Small town living in MN.  \nAshleybp5@gmail.com \n#homeschool #bossmom #builder #momlife', 'arts_food_life': 'Life through my phone.', 'medgizmo': 'Wearable #tech: #health #healthcare #wellness #gadgets #apps. Images/links provided as information resource only; doesn’t mean we endorse referenced', 'sawearables': 'The home of wearable tech in South Africa!\n--> #WearableTech #WearableTechnology #FitnessTech       Find your wearable @', 'shop.mercury': 'Changing the way you charge.⚡️\nGet exclusive product discounts, and help us reach our goal below!', 'invisawear': 'PRE-ORDERS NOW AVAILABLE! Get yours 25% OFF here: #girlboss #wearabletech'}

for key in d:
    print("---with emojis----")
    print(d[key])
    print("---emojis removed----")
    x=''.join(c for c in d[key] if c <= '\uFFFF')
    print(x)

输出示例

---with emojis----
MN
Wife • Homeschooling Mom to 5  • D Y I lover  • Small town living in MN.  
Ashleybp5@gmail.com 
#homeschool #bossmom #builder #momlife
---emojis removed----
MN
Wife • Homeschooling Mom to 5  • D Y I lover  • Small town living in MN.  
Ashleybp5@gmail.com 
#homeschool #bossmom #builder #momlife
---with emojis----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!
---emojis removed----
Changing the way you charge.⚡️
Get exclusive product discounts, and help us reach our goal below!

1 个答案:

答案 0 :(得分:2)

没有关于&#34; emoji&#34;的技术定义。是。各种字形可用于呈现可打印字符,符号,控制字符等。什么似乎是&#34;表情符号&#34;你可能是其他人的正常剧本的一部分。

您可能想要做的是查看每个字符的Unicode category并过滤掉各种类别。虽然这本身并不能解决表情符号和定义问题,但你可以更好地控制你实际做的事情,而不需要删除,例如,字面上所有语言的字符都是2/3行星

除了过滤掉某些类别,您可以过滤除小写和大写字母(和数字)之外的所有内容。但是,请注意ꙭ不是&#34; googly眼睛表情符号&#34;但CYRILLIC SMALL LETTER DOUBLE MONOCULAR O,这是一个普通的小写字母,数百万人。

例如:

import unicodedata

s = "Wife • Homeschooling Mom to 5  • D Y I lover  • Small town living in MN. "

# Just filter category "symbol"
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', ))
print(t)

...结果

Wife • Homeschooling Mom to 5  • D Y I lover  • Small town living in MN.

这可能不是表情符号,但在技术上是一种标点符号。所以也要过滤这个

# Filter symbols and punctuations. You may want 'Cc' as well,
# to get rid of control characters. Beware that newlines are a
# form of control-character.
t = ''.join(c for c in s if unicodedata.category(c) not in ('So', 'Po'))
print(t)

你得到了

Wife  Homeschooling Mom to 5   D Y I lover   Small town living in MN