从 URL 创建熊猫数据框

时间:2021-06-30 11:41:12

标签: pandas

这一定很容易,但我无法以正确的形式获取此数据框。

df = pd.read_json('https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/user/Python_(programming_language)/daily/20210101/20210501')

预期的列是:

<块引用>

项目、文章、粒度、时间戳、访问、代理、用户、视图

2 个答案:

答案 0 :(得分:1)

>>> df = pd.read_json('https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/user/Python_(programming_language)/daily/20210101/20210501')
>>> pd.concat([df.drop(['items'], axis=1), df['items'].apply(pd.Series)], axis=1)
          project                        article granularity   timestamp      access agent  views
0    en.wikipedia  Python_(programming_language)       daily  2021010100  all-access  user   7238
1    en.wikipedia  Python_(programming_language)       daily  2021010200  all-access  user   8449
2    en.wikipedia  Python_(programming_language)       daily  2021010300  all-access  user   8669
3    en.wikipedia  Python_(programming_language)       daily  2021010400  all-access  user  10688
4    en.wikipedia  Python_(programming_language)       daily  2021010500  all-access  user  11383
..            ...                            ...         ...         ...         ...   ...    ...
116  en.wikipedia  Python_(programming_language)       daily  2021042700  all-access  user   6125
117  en.wikipedia  Python_(programming_language)       daily  2021042800  all-access  user   6184
118  en.wikipedia  Python_(programming_language)       daily  2021042900  all-access  user   5960
119  en.wikipedia  Python_(programming_language)       daily  2021043000  all-access  user   5489
120  en.wikipedia  Python_(programming_language)       daily  2021050100  all-access  user   4297

[121 rows x 7 columns]
>>>

答案 1 :(得分:1)

您也可以使用 assign -


>>> df = pd.read_json('https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/user/Python_(programming_language)/daily/20210101/20210501')
>>> 
>>> 
>>> df.drop('items', 1).assign(**df['items'].apply(pd.Series))
          project                        article granularity   timestamp      access agent  views
0    en.wikipedia  Python_(programming_language)       daily  2021010100  all-access  user   7238
1    en.wikipedia  Python_(programming_language)       daily  2021010200  all-access  user   8449
2    en.wikipedia  Python_(programming_language)       daily  2021010300  all-access  user   8669
3    en.wikipedia  Python_(programming_language)       daily  2021010400  all-access  user  10688
4    en.wikipedia  Python_(programming_language)       daily  2021010500  all-access  user  11383
..            ...                            ...         ...         ...         ...   ...    ...
116  en.wikipedia  Python_(programming_language)       daily  2021042700  all-access  user   6125
117  en.wikipedia  Python_(programming_language)       daily  2021042800  all-access  user   6184
118  en.wikipedia  Python_(programming_language)       daily  2021042900  all-access  user   5960
119  en.wikipedia  Python_(programming_language)       daily  2021043000  all-access  user   5489
120  en.wikipedia  Python_(programming_language)       daily  2021050100  all-access  user   4297

[121 rows x 7 columns]