使用Python 3和MongoDB 2.6并尝试将一些数据插入到我的集合中,以下是示例代码:
from urllib.parse import urlparse
from bs4 import BeautifulSoup
import requests
from pymongo import MongoClient
urlList = ['http://....'] #bunch of URLs
jsArray = []
cssArray = []
client = MongoClient('127.0.0.1', 28017)
db = client.tagFinderProject # Getting the DB
collection = db.tegFinder # Getting the Collection
for url in urlList:
parsed_uri = urlparse(url)
domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
for lines in soup.find_all('script'):
if lines.get('src') is not None and '.js' in lines.get('src') and 'http' in lines.get('src'):
jsArray.append(lines.get('src'))
elif str(lines.get('src')).startswith('//'):
jsArray.append('http:' + lines.get('src'))
elif lines.get('src') is not None and '.js' in lines.get('src') and 'http' not in lines.get('src'):
jsArray.append(domain + lines.get('src'))
for lines in soup.find_all('link'):
if lines.get('href') is not None and (lines.get('href')).endswith('.css') and 'http' in lines.get('href'):
cssArray.append(lines.get('href'))
elif lines.get('href') is not None and (lines.get('href')).endswith('.css') and 'http' not in lines.get('href'):
cssArray.append(domain + lines.get('href'))
uniqueJS = list(set(jsArray))
uniqueCSS = list(set(cssArray))
for js in uniqueJS:
collection.insert('JS: ', js)
for css in uniqueCSS:
collection.insert('CSS: ', css)
在我运行之前,我启动了我的MongoDB服务器,在这里说的是:
2015-05-13T11:25:03.942-0500 [initandlisten] options: { net: { http: { RESTInterfaceEnabled: true, enabled: true } }, storage: { dbPath: "D:\Projects\mongoDB" } }
2015-05-13T11:25:03.944-0500 [initandlisten] journal dir=D:\Projects\mongoDB\journal
2015-05-13T11:25:03.944-0500 [initandlisten] recover : no journal files present, no recovery needed
2015-05-13T11:25:04.045-0500 [initandlisten] waiting for connections on port 27017
2015-05-13T11:25:04.045-0500 [websvr] admin web console waiting for connections on port 28017
我运行上面的Python代码,然后得到:
File ".../TagFinder/tagFinder.py", line 91, in <module>
collection.insert('JS: ', js)
File "C:\Python34\lib\site-packages\pymongo\collection.py", line 1924, in insert
with self._socket_for_writes() as sock_info:
File "C:\Python34\lib\contextlib.py", line 59, in __enter__
return next(self.gen)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 663, in _get_socket
server = self._get_topology().select_server(selector)
File "C:\Python34\lib\site-packages\pymongo\topology.py", line 121, in select_server
address))
File "C:\Python34\lib\site-packages\pymongo\topology.py", line 97, in select_servers
self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: connection closed
无法找到我为什么会这样做。我可以使用cmd promt插入数据,我可以在127.0.0.1/tagFinderProject/tagFinder/
有人能指出我正确的方向吗?
编辑1:
如果我将client = MongoClient('127.0.0.1', 28017)
更改为client = MongoClient('mongodb://127.0.0.1:27017/')
我明白了:
TypeError: 'str' object does not support item assignment
参考:collection.insert('JS: ', js)
答案 0 :(得分:1)
发现问题;
由于我的错字,感到愚蠢。 1)集合名称:tegFinder
但我试图获得127.0.0.1/tagFinderProject/tagFinder/
2)MongoDB无法插入Strings
但仅dict
意味着它需要键:值对。所以我把它改成了:
dictJS = {'JS: ': js}
collection.insert(dictJS)
3)不是解决方案的一部分,但我将连接留空:
client = MongoClient() # Instead 'cliecnt = MongoClient(mongodb://127.0.0.1:27017, 28017')
db = client.tagFinderProject # Getting the DB
collection = db.tegFinder