我制作了一个python抓取工具,从特定网站获取json。
我尝试构建文件,以便我可以提取数据以便保存到数据库中。
提取脚本的功能:
Public Sub getDetailsByAppName()
Dim objProcessSet As WbemScripting.SWbemObjectSet
Dim objProcess As WbemScripting.SWbemObject
Dim objServices As WbemScripting.SWbemServices
Dim objLocator As WbemScripting.SWbemLocator
'set up wmi for local computer querying
Set objLocator = New WbemScripting.SWbemLocator
Set objServices = objLocator.ConnectServer(".") 'local
'Get all the gory details for a name of a running application
Set objProcessSet = objServices.ExecQuery("SELECT * FROM Win32_Process WHERE name = ""firefox.exe""", , 48)
RecordCount = 1
'Loop through each process returned
For Each objProcess In objProcessSet
'Loop through each property/field
For Each Field In objProcess.Properties_
Debug.Print RecordCount, Field.Name, Field.Value
Next
RecordCount = RecordCount + 1
Next
Set objProcessSet = Nothing
Set objServices = Nothing
Set objLocator = Nothing
End Sub
功能后的结果:
s = page_ad.findAll('script')[27].text.replace('\'', '"')
s = re.search(r'\{.+\}', s, re.DOTALL).group() # get json data
s = re.sub(r'//.+\n', '', s) # replace comment
s = re.sub(r'\s+', '', s) # strip whitspace
s = re.sub(r',}', '}', s) # get rid of last , in the dict
但是当我尝试转换为json时,它会向我显示此错误。
Json转换:
{varsource="".toLowerCase();if(mobileSources.indexOf(source)!=-1){returntrue;}returnfalse;}functiongetSource(){varmsiteSources=["mobile","msite"];varuserAgent=navigator.userAgent.toLowerCase();varsource="".toLowerCase();if(mobileSources.indexOf(source)!=-1){if(msiteSources.indexOf(source)!=-1){source="msite";varresultMatch=userAgent.match(/\olx-source\/(\w+);/);if(resultMatch){source=resultMatch[1];}}}else{source="web";}returnsource;}dataLayer=function(){varinitialDatalayer={"config":{"lurkerURL":"},"site":{"isMobile":isMobile(),"source":getSource()},"page":{"pageType":"ad_detail","detail":{"parent_category_id":"2000","category_id":"2020","state_id":"2","region_id":"31","ad_id":"382568903","list_id":"314710679","city_id":"9238","zipcode":"32606174","price":"19900"},"adDetail":{"adID":"382568903","listID":"314710679","sellerName":"MichelleAlcântara","adDate":"2017-03-1113:10:55","mainCategory":"Veículosebarcos","mainCategoryID":"2000","subCategory":"Carros","subCategoryID":"2020","state":"MG","ddd":"31","region":"BeloHorizonteeregião","price":"19900"}},"session":{"user":{"userID":null,"loginType":null}},"pageType":"Ad_detail","abtestingEnable":"1","listingCategory":"2020","adId":"382568903","state":"2","region":"31","category":"2020","pictures":"5","listId":"314710679","loggedUser":"0","referrer":""};if(self.adParams){for(keyinadParams){varpage=initialDatalayer.page;page.detail[key]=adParams[key];if(page.adDetail){page.adDetail[key]=adParams[key];}}}return[initialDatalayer];}
消息错误:
dataLayer = json.loads(s)
答案 0 :(得分:0)
JSON是一个序列化的数据结构,而不仅仅是简单的javascript代码。
这是一个有效的JSON:
{"key" : value, "key2" : "value2_string"}
可以“转换”为python dict。
您尝试loads
的字符串只是一个javascript代码。
您可以在此处获取有关JSON的更多信息:http://json.org/