Question

我正在学习使用其API从劳工统计局提取数据。示例代码：

import requests
import json
import prettytable
headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid": ['LAUMT421090000000005'],"startyear":"2011", "endyear":"2014"})
p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)
for series in json_data['Results']['series']:
    x=prettytable.PrettyTable(["series id","year","period","value","footnotes"])
    seriesId = series['seriesID']
    for item in series['data']:
        year = item['year']
        period = item['period']
        value = item['value']
        footnotes=""
        for footnote in item['footnotes']:
            if footnote:
                footnotes = footnotes + footnote['text'] + ','
       'if 'M01' <= period <= 'M12':'
            x.add_row([seriesId,year,period,value,footnotes[0:-1]])
    output = open(seriesId + '.txt','w')
    output.write (x.get_string())
    output.close()

我只是更改了seriesID以获取所需的数据。输出的代码生成是一个以Series ID命名的文本文件。

提取的数据显示为：

数据的实际文本的一部分：

LAUMT421090000000005 | 2014 | M12 | 405757 |数据将于2018年4月20日进行修订。 | LAUMT421090000000005 | 2014 | M11 | 406061 |数据将于2018年4月20日进行修订。 | LAUMT421090000000005 | 2014 | M10 | 405358 |数据将于2018年4月20日进行修订。 | LAUMT421090000000005 | 2014 | M09 | 402164 |数据将于2018年4月20日进行修订。 | LAUMT421090000000005 | 2014 | M08 | 400534 |数据将于2018年4月20日进行修改。

作为使用Python和API的新手，更改部分源代码以获取所需的数据是我现在可以实现的最大目标。由于工作原因，我需要在Excel中显示数据（我知道我会皱眉，但这就是事实）。但是，Excel无法识别“ |”作为分隔符。

该示例代码使用prettytable表库生成输出数据。我想知道是否还有其他方法可以提取数据，从而使结果更易于处理或转换为逗号分隔的值。

谢谢。

Answer 1

在下面的代码中，我创建一个数据框，然后使用pandas.DataFrame.to_csv()（link to docs）将其输出到.csv文件。在此示例中，如果用户输入了无效的Series ID（本示例不是这种情况），或者用户超出了BLS API的每日点击次数（未注册用户可能会请求），我还将添加try和except子句以引发异常每天最多25个查询-per documentation FAQs）。

在此示例中，我添加了一个列显示SeriesID的列，如果您的SeriesID列表包含多个项目，这将更加有用，并且将有助于区分不同的系列。我还对脚注列进行了一些额外的操作，以使其在输出的数据框中更有意义。

import pandas as pd
import json
import requests

headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid": ['LAUMT421090000000005'],"startyear":"2011", "endyear":"2014"})
p = requests.post('https://api.bls.gov/publicAPI/v1/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)
try:
    df = pd.DataFrame()
    for series in json_data['Results']['series']:
        df_initial = pd.DataFrame(series)
        series_col = df_initial['seriesID'][0]
        for i in range(0, len(df_initial) - 1):
            df_row = pd.DataFrame(df_initial['data'][i])
            df_row['seriesID'] = series_col
            if 'code' not in str(df_row['footnotes']): 
                df_row['footnotes'] = ''
            else:
                df_row['footnotes'] = str(df_row['footnotes']).split("'code': '",1)[1][:1]
            df = df.append(df_row, ignore_index=True)
    df.to_csv('blsdata.csv', index=False)
except:
    json_data['status'] == 'REQUEST_NOT_PROCESSED'
    print('BLS API has given the following Response:', json_data['status'])
    print('Reason:', json_data['message'])

如果您想输出.xlsx文件而不是.csv，只需将df.to_csv('blsdata.csv', index=False)替换为df.to_excel('blsdata.xlsx', index=False, engine='xlsxwriter')。

预期输出的.csv文件：

如果使用pandas.DataFrame.to_excel()而不是pandas.DataFrame.to_csv()，则预期输出的.xlsx文件：

Answer 2

import requests
import json
import csv
headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid": ['LAUMT421090000000005'],"startyear":"2011", "endyear":"2014"})
p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)

在上面的代码行之后，请使用以下stackoverflow链接中描述的步骤进行后续步骤： How can I convert JSON to CSV?

希望有帮助！

从特定的API（BLS）创建逗号分隔的数据表，而不使用如示例代码所示的prettytable？

2 个答案: