如何从熊猫系列中删除尾随点?
import numpy as np
import pandas as pd
pd.set_option('max_colwidth',1000)
s = pd.Series(["""Finally a transparant silicon case ^^ Thanks to my uncle :) #yay #Sony #Xperia #S #sonyexperias… http://instagram.com/p/YGEt5JC6JM/"""])
s.str.replace(r'(\w)\.+',r'\1',regex=True)
Finally a transparant silicon case ^^ Thanks to my uncle :) #yay #Sony #Xperia #S #sonyexperias… http://instagramcom/p/YGEt5JC6JM/
wanted:
Finally a transparant silicon case ^^ Thanks to my uncle :) #yay #Sony #Xperia #S #sonyexperia http://instagramcom/p/YGEt5JC6JM/
答案 0 :(得分:3)
这些不是句点,而是省略号字符,它是Unicode字符\u2026
。参见How should I write three dots?
s.str.replace(r'(\w)\u2026+',r'\1',regex=True)
答案 1 :(得分:2)
您可以尝试按照显示的示例编写吗?
pd.set_option('max_colwidth',1000)
s = pd.Series(["""Finally a transparant silicon case ^^ Thanks to my uncle :) #yay #Sony #Xperia #S #sonyexperias… http://instagram.com/p/YGEt5JC6JM/"""])
s.str.replace(r'…+',r'')
答案 2 :(得分:0)
根据Barmar的建议:
s = pd.Series(["""Finally a transparant silicon case ^^ Thanks to my uncle :) #yay #Sony #Xperia #S #sonyexperias… http://instagram.com/p/YGEt5JC6JM/"""])
s.str.replace(r'(\w)…',r'\1',regex=True)
Gives:
Finally a transparant silicon case ^^ Thanks to my uncle :) #yay #Sony #Xperia #S #sonyexperias http://instagram.com/p/YGEt5JC6JM/