我有来自福布斯的这个链接.. http://www.forbes.com/global2000/list/。我需要将2000强公司的表格纳入数据框进行分析。我该怎么做?
答案 0 :(得分:3)
您可以直接使用pd.read_json
,因为基础表是从json
resposne生成的。
提示:检查浏览器的网络标签,xhr
请求url
。
In [38]: df = pd.read_json('http://www.forbes.com/ajax/list/data?year=2016&uri=glo
...: bal2000&type=organization')
In [40]: df.shape
Out[40]: (2001, 16)
In [41]: df.head(2)
Out[41]:
assets ceo country headquarters imageUri \
0 32718.0 Inge Thulin United States Minnesota 3m
1 7454.0 Simon Borrows United Kingdom United Kingdom 3i-group
industry marketValue name position profits rank \
0 Conglomerates 102175.0 3M 200 4833.0 200
1 Investment Services 6685.0 3i Group 1562 925.0 1562
revenue squareImage state thumbnail uri
0 30274.0 NaN Minnesota NaN 3m
1 485.0 NaN NaN NaN 3i-group