删除熊猫数据框中以“http”开头的单词

时间:2021-03-10 17:10:08

标签: python pandas

我之前问过这个问题,但我弄错了数据类型!

我有我的 Pandas 数据框,看起来像这样

print(data)
      text
0      FollowFriday    for being top engaged members...
1      Hey James! How odd :/ Please call our Contact...
2      we had a listen last night :) As You Bleed is...

在这个数据框中有链接,它们都以“http”开头。我已经在下面的一个函数中得到了一行代码,它删除了以“@”开头的单词和其他清理方法。

def cleanData(data):
    #Loop through the data, creating a new dataframe with only ascii characters
    data['text'] = data['text'].apply(lambda s: "".join(char for char in s if char.isascii()))
    #Remove any tokens with numbers, or digits.
    data['text'] = data['text'].apply(lambda s: "".join(char for char in s if not char.isdigit()))
    #Removes any words which start with @, which are replies. 
    data['text']= data['text'].str.replace('(@\w+.*?)',"")
    #Remove any left over characters 
    data = data['text'].str.replace('[^\w\s]','')
    #return the cleaned data
    return data

谁能帮忙删除以“http”开头的单词?我已经尝试编辑我所拥有的但到目前为止没有运气。

提前致谢!

2 个答案:

答案 0 :(得分:1)

使用Series.str.replace

data['text'] = data['text'].str.replace('http[^\s]*',"")

答案 1 :(得分:0)

一种选择是使用 str.replace() 方法:

df = pd.DataFrame( dict(text = [r'FollowFridayhttphttp  http http for being top engaged members.',r'James!http How odd http:/ Please call ou',r'httpe had a listen last night :) As You Bleed is...']))

df['text'] = df['text'].apply(lambda x: x.replace('http',''))

你可以在你的函数中做这样的事情。

相关问题