Question

我已将数据插入pandas数据帧。如图所示你可以看到有一些行包含url链接我想删除所有url链接并用“”替换它们（没有什么只是擦它）因为你可以看到第4行有一个url还有其他行也有网址。我想浏览status_message列中的所有行，找到任何网址并删除它们。我一直在看这个How to remove any URL within a string in Python，但我不确定如何在数据框架上使用它。所以第4行现在应该投票支持劳工登记。

Answer 1

您可以将str.replace与case=False参数一起使用：

df = pd.DataFrame({'status_message':['a s sd Www.labour.com',
                                    'httP://lab.net dud ff a',
                                     'a ss HTTPS://dd.com ur o']})
print (df)
             status_message
0     a s sd Www.labour.com
1   httP://lab.net dud ff a
2  a ss HTTPS://dd.com ur o

df['status_message'] = df['status_message'].str.replace('http\S+|www.\S+', '', case=False)
print (df)
  status_message
0        a s sd 
1       dud ff a
2     a ss  ur o

Answer 2

您可以使用.replace()和正则表达式来实现这一点，即

df = pd.DataFrame({'A':['Nice to meet you www.xy.com amazing','Wow https://www.goal.com','Amazing http://Goooooo.com']})
df['A'] = df['A'].replace(r'http\S+', '', regex=True).replace(r'www\S+', '', regex=True)

输出：

                           A
0  Nice to meet you amazing
1                       Wow 
2                   Amazing

Answer 3

我认为你可以做一些简单的事情

for index,row in data.iterrows():
    desc = row['status_message'].lower().split()
    print ' '.join(word for word in desc if not word.startswith(('www.','http')))

只要网址以“www。”开头

Answer 4

df.status_message = df.status_message.str.replace（“ www。”，“”）

从python panda数据帧中的大量文本中逐行删除URL

4 个答案: