如何在Google colaboratory

时间:2018-04-27 10:16:31

标签: python google-colaboratory word-embedding

我已经用wget

下载了数据
!wget http://nlp.stanford.edu/data/glove.6B.zip
 - ‘glove.6B.zip’ saved [862182613/862182613]

它保存为zip,我想使用zip文件中的glove.6B.300d.txt文件。我想要实现的是:

embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:],dtype='float32')
        embeddings_index[word] = coefs

我当然有这个错误:

IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
      1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
      3     for line in f:
      4         values = line.split()
      5         word = values[0]

IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'

如何在Google colab上面的代码中解压缩并使用该文件?

3 个答案:

答案 0 :(得分:2)

您可以执行的另一种方法如下。

1。下载zip文件

!wget http://nlp.stanford.edu/data/glove.6B.zip

将下载的zip文件下载到Google colab的/ content目录中。

2。解压缩

!unzip glove*.zip

3。使用

获取提取嵌入矢量的确切路径
!ls
!pwd

4。索引向量

print('Indexing word vectors.')

embeddings_index = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

5。与Google融合-驱动器

!pip install --upgrade pip
!pip install -U -q pydrive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null

!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

!mkdir -p drive
!google-drive-ocamlfuse drive

6。将索引的矢量保存到Google驱动器中以供重复使用

import pickle
pickle.dump({'embedding_index' : embedding_index } , open('drive/path/to/your/file/location', 'wb')

如果您已经在本地系统中下载了zip文件,只需将其解压缩并将所需的尺寸文件上传到google drive->保险丝gdrive->提供适当的路径,然后使用它/为其创建索引等。

另一种方法是,如果已经通过colab中的代码将其下载到本地系统中

from google.colab import files
files.upload()

选择文件并按照第3步开始使用它。

这是在Google colaboratory中嵌入手套词的方法。希望对您有帮助。

答案 1 :(得分:1)

很简单,从SO中结帐older post

import zipfile
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()

答案 2 :(得分:0)

如果您拥有Google云端硬盘,则可以:

  1. 安装您的Google云端硬盘,以便可以在Colab笔记本中使用它

    from google.colab import drive
    drive.mount('/content/gdrive')
    
  2. 下载Gloves.6B.zip并将其解压缩到您在Google云端硬盘中选择的位置,例如

    "My Drive/Place/Of/Your/Choice/glove.6B.300d.txt"
    
  3. 直接从Colab笔记本中打开文件

    with io.open('/content/gdrive/Place/Of/Your/Choice/glove.6B.300d.txt', encoding='utf8') as f: