我有以下数据框:
import pandas as pd
df = pd.DataFrame({'col':['text https://random.website1.com text', 'text https://random.website2.com']})
我想删除此列中的所有链接。
有什么想法吗?
答案 0 :(得分:3)
将列表理解与拆分和测试URL结合使用,最后按空格连接值:
from urllib.parse import urlparse
#https://stackoverflow.com/a/52455972
def is_url(url):
try:
result = urlparse(url)
return all([result.scheme, result.netloc])
except ValueError:
return False
df['new'] = [' '.join(y for y in x.split() if not is_url(y)) for x in df['col']]
print (df)
col new
0 text https://random.website1.com text text text
1 text https://random.website2.com text
答案 1 :(得分:1)
使用正则表达式。
例如:
import pandas as pd
df = pd.DataFrame({'col':['text https://random.website1.com text', 'text https://random.website2.com']})
#Ref https://stackoverflow.com/questions/10475027/extracting-url-link-using-regular-expression-re-string-matching-python
df["col_new"] = df["col"].str.replace(r'https?://[^\s<>"]+|www\.[^\s<>"]+', "")
print(df)
col col_new
0 text https://random.website1.com text text text
1 text https://random.website2.com text