使用urllib在pandas数据框中解析/拆分URL

时间:2018-02-22 12:42:33

标签: python pandas dataframe urllib

我正在尝试拆分URL并将片段放在数据框中。我找到了这个帖子pythonic way to parse/split URLs in a pandas dataframe并尝试应用它,但由于某种原因它给了我一个错误。

我在Python 3.x下,所以我使用了以下内容:

import pandas
import urllib

urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*df['url'].map(urllib.parse.urlsplit))

我收到错误说KeyError: 'urls',不确定它是什么意思。

如果有人可以帮助会很棒。感谢。

1 个答案:

答案 0 :(得分:1)

您使用的示例假定链接位于数据框中。这是正确的解决方案:

import urllib
import pandas as pd

df = pd.DataFrame()
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*[urllib.parse.urlsplit(x) for x in urls])

<强>结果

  protocol           domain               path query fragment
0    https   www.google.com         /something
1    https  mail.google.com  /anohtersomething
2    https   www.amazon.com   /yetanotherthing