我是python的新手。我正在编写一个脚本来从网站上提取一些数据并绘制图表。但是,我的代码错误,说数据类型不正确。具体来说,我有'值'的十进制值和'年'的日期。我试图重新定义它们,但我认为我将定义放在了错误的位置。任何帮助将不胜感激,代码如下。
import numpy as np
import pandas as pd
import json
import matplotlib.pyplot as mp
from IPython.display import HTML
import getpass
import requests
def frame(url, height=400, width=100):
display_string = '<frame src={url} width={w} height={h}>
</iframe>'.format(url=url, w=width, h=height)
return HTML(display_string)
frame('https://data.bls.gov/registrationEngine/')
registration_key = getpass.getpass('Enter Registration Key: ')
series = 'MPU4900012'
frame('https://api.bls.gov/publicAPI/v1/timeseries/data/')
def capture_series(series, start, end, key=registration_key):
url = 'https://api.bls.gov/publicAPI/v2/timeseries/data/'
url += '?registrationkey={key}'.format(key=key)
data = json.dumps({
"seriesid": [series],
"startyear": str(start),
"endyear": str(end)
})
headers = {
"Content-type": "application/json"
}
result = requests.post(url, data=data, headers=headers)
return json.loads(result.text)
json_data = capture_series(series, 1987, 2016)
json_data
df_data = pd.DataFrame(json_data['Results']['series'][0]['data'])
print(df_data)
df_sub = df_data[['value', 'year']].astype(float).astype(int)
df_sub.set_index('year', inplace=True)
df_sub.sort_index(inplace=True)
df_sub
x = df_sub.index
y = df_sub['value']
mp.plot(x,y)
mp.title('Major Sector Multifactor Productivity')
mp.xlabel('years')
mp.ylabel('values')
mp.show
当我运行代码时,我首先得到这个表,这是站点数据。
footnotes period periodName value year
0 [{}] A01 Annual 86.244 1996
1 [{}] A01 Annual 84.713 1995
2 [{}] A01 Annual 85.141 1994
3 [{}] A01 Annual 84.688 1993
4 [{}] A01 Annual 85.037 1992
5 [{}] A01 Annual 82.280 1991
6 [{}] A01 Annual 82.625 1990
7 [{}] A01 Annual 81.965 1989
8 [{}] A01 Annual 81.587 1988
9 [{}] A01 Annual 80.816 1987
错误日志显示了这一点(使用Jupyter w / Python 3作为参考)
ValueError Traceback (most recent call last)
<ipython-input-101-8ee6d83ca777> in <module>()
41 print(df_data)
42
---> 43 df_sub = df_data[['value', 'year']].astype(int)
44 df_sub.set_index('year', inplace=True)
45 df_sub.sort_index(inplace=True)
...
ValueError: invalid literal for int() with base 10: '86.244'
答案 0 :(得分:2)
好的,我玩了你的例子。
我认为str
列是.astype(float)
类型。这意味着您需要先使用>>> data = {'value': {0: '84.713', 1: '85.141', 2: '84.688', 3: '85.037',
4: '82.280', 5: '82.625', 6: '81.965', 7: '81.587', 8: '80.816'},
'year': {0: '1995', 1: '1994', 2: '1993', 3: '1992', 4: '1991',
5: '1990', 6: '1989', 7: '1988', 8: '1987'}}
>>> df = pd.DataFrame(data)
>>> df
value year
0 84.713 1995
1 85.141 1994
2 84.688 1993
3 85.037 1992
4 82.280 1991
5 82.625 1990
6 81.965 1989
7 81.587 1988
8 80.816 1987
>>> df['value'].astype(int) # <- replicating eror
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: '84.713'
>>> df['value'].astype(float).astype(int) # <= HERE
0 84
1 85
2 84
3 85
4 82
5 82
6 81
7 81
8 80
Name: value, dtype: int32
。
下面:
df[['value', 'year']].astype(float).astype(int)
所以使用:
{{1}}