我正在尝试拆分URL并将片段放在数据框中。我找到了这个帖子pythonic way to parse/split URLs in a pandas dataframe并尝试应用它,但由于某种原因它给了我一个错误。
我在Python 3.x下,所以我使用了以下内容:
import pandas
import urllib
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*df['url'].map(urllib.parse.urlsplit))
我收到错误说KeyError: 'urls'
,不确定它是什么意思。
如果有人可以帮助会很棒。感谢。
答案 0 :(得分:1)
您使用的示例假定链接位于数据框中。这是正确的解决方案:
import urllib
import pandas as pd
df = pd.DataFrame()
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*[urllib.parse.urlsplit(x) for x in urls])
<强>结果
protocol domain path query fragment
0 https www.google.com /something
1 https mail.google.com /anohtersomething
2 https www.amazon.com /yetanotherthing