我已经用wget
下载了数据!wget http://nlp.stanford.edu/data/glove.6B.zip
- ‘glove.6B.zip’ saved [862182613/862182613]
它保存为zip,我想使用zip文件中的glove.6B.300d.txt文件。我想要实现的是:
embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],dtype='float32')
embeddings_index[word] = coefs
我当然有这个错误:
IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
3 for line in f:
4 values = line.split()
5 word = values[0]
IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'
如何在Google colab上面的代码中解压缩并使用该文件?
答案 0 :(得分:2)
您可以执行的另一种方法如下。
!wget http://nlp.stanford.edu/data/glove.6B.zip
将下载的zip文件下载到Google colab的/ content目录中。
!unzip glove*.zip
!ls
!pwd
print('Indexing word vectors.')
embeddings_index = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))
!pip install --upgrade pip
!pip install -U -q pydrive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p drive
!google-drive-ocamlfuse drive
import pickle
pickle.dump({'embedding_index' : embedding_index } , open('drive/path/to/your/file/location', 'wb')
如果您已经在本地系统中下载了zip文件,只需将其解压缩并将所需的尺寸文件上传到google drive->保险丝gdrive->提供适当的路径,然后使用它/为其创建索引等。
另一种方法是,如果已经通过colab中的代码将其下载到本地系统中
from google.colab import files
files.upload()
选择文件并按照第3步开始使用它。
这是在Google colaboratory中嵌入手套词的方法。希望对您有帮助。
答案 1 :(得分:1)
很简单,从SO中结帐older post。
import zipfile
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()
答案 2 :(得分:0)
如果您拥有Google云端硬盘,则可以:
安装您的Google云端硬盘,以便可以在Colab笔记本中使用它
from google.colab import drive
drive.mount('/content/gdrive')
下载Gloves.6B.zip并将其解压缩到您在Google云端硬盘中选择的位置,例如
"My Drive/Place/Of/Your/Choice/glove.6B.300d.txt"
直接从Colab笔记本中打开文件
with io.open('/content/gdrive/Place/Of/Your/Choice/glove.6B.300d.txt', encoding='utf8') as f: