我写了一个代码来从网站中提取一些信息。输出为JSON,我想将其导出为CSV。因此,我尝试将其转换为熊猫数据框,然后将其导出到熊猫的CSV中。我可以打印结果,但是仍然不能将文件转换为熊猫数据框。您知道我的代码有什么问题吗?
# -*- coding: utf-8 -*-
# To create http request/session
import requests
import re, urllib
import pandas as pd
from BeautifulSoup import BeautifulSoup
url = "https://www.indeed.com/jobs?
q=construction%20manager&l=Houston&start=10"
# create session
s = requests.session()
html = s.get(url).text
# exctract job IDs
job_ids = ','.join(re.findall(r"jobKeysWithInfo\['(.+?)'\]", html))
ajax_url = 'https://www.indeed.com/rpc/jobdescs?jks=' +
urllib.quote(job_ids)
# do Ajax request and convert the response to json
ajax_content = s.get(ajax_url).json()
print(ajax_content)
#Convert to pandas dataframe
df = pd.read_json(ajax_content)
#Export to CSV
df.to_csv("c:\\users\\Name\desktop\\newcsv.csv")
错误消息是:
回溯(最近通话最近一次):
文件“ C:\ Users \ Mehrdad \ Desktop \ Indeed 06.py”,第21行,在 df = pd.read_json(ajax_content)
read_json中的文件“ c:\ python27 \ lib \ site-packages \ pandas \ io \ json \ json.py”,第408行 path_or_buf,encoding = encoding,compression = compression,
get_filepath_or_buffer中的文件“ c:\ python27 \ lib \ site-packages \ pandas \ io \ common.py”,第218行 引发ValueError(msg.format(_type = type(filepath_or_buffer)))
ValueError:无效的文件路径或缓冲区对象类型:
答案 0 :(得分:1)
问题在于,当您调用read_json()
时,数据框没有任何内容,因为它是嵌套的JSON字典:
import requests
import re, urllib
import pandas as pd
from pandas.io.json import json_normalize
url = "https://www.indeed.com/jobs?q=construction%20manager&l=Houston&start=10"
s = requests.session()
html = s.get(url).text
job_ids = ','.join(re.findall(r"jobKeysWithInfo\['(.+?)'\]", html))
ajax_url = 'https://www.indeed.com/rpc/jobdescs?jks=' + urllib.quote(job_ids)
ajax_content= s.get(ajax_url).json()
df = json_normalize(ajax_content).transpose()
df.to_csv('your_output_file.csv')
请注意,我调用了json_normalize()
来折叠JSON中的嵌套列。我还调用了transpose()
,以便将行标记为作业ID,而不是列。这将为您提供一个如下所示的数据框:
0079ccae458b4dcf <p><b>Company Environment: </b></p><p>Planet F...
0c1ab61fe31a5c62 <p><b>Commercial Construction Project Manager<...
0feac44386ddcf99 <div><div>Trendmaker Homes is currently seekin...
...
虽然并不清楚您的预期输出是什么,但是您希望DataFrame / CSV文件看起来像什么?如果您实际上只在查找具有工作ID作为列标签的单行/系列,只需删除对transpose()