我之前问过这个问题,但我弄错了数据类型!
我有我的 Pandas 数据框,看起来像这样
print(data)
text
0 FollowFriday for being top engaged members...
1 Hey James! How odd :/ Please call our Contact...
2 we had a listen last night :) As You Bleed is...
在这个数据框中有链接,它们都以“http”开头。我已经在下面的一个函数中得到了一行代码,它删除了以“@”开头的单词和其他清理方法。
def cleanData(data):
#Loop through the data, creating a new dataframe with only ascii characters
data['text'] = data['text'].apply(lambda s: "".join(char for char in s if char.isascii()))
#Remove any tokens with numbers, or digits.
data['text'] = data['text'].apply(lambda s: "".join(char for char in s if not char.isdigit()))
#Removes any words which start with @, which are replies.
data['text']= data['text'].str.replace('(@\w+.*?)',"")
#Remove any left over characters
data = data['text'].str.replace('[^\w\s]','')
#return the cleaned data
return data
谁能帮忙删除以“http”开头的单词?我已经尝试编辑我所拥有的但到目前为止没有运气。
提前致谢!
答案 0 :(得分:1)
使用Series.str.replace
data['text'] = data['text'].str.replace('http[^\s]*',"")
答案 1 :(得分:0)
一种选择是使用 str.replace() 方法:
df = pd.DataFrame( dict(text = [r'FollowFridayhttphttp http http for being top engaged members.',r'James!http How odd http:/ Please call ou',r'httpe had a listen last night :) As You Bleed is...']))
df['text'] = df['text'].apply(lambda x: x.replace('http',''))
你可以在你的函数中做这样的事情。