如何在Squad2.0上微调BERT

时间:2020-05-19 02:05:33

标签: python nlp bert-language-model squad

我真的是BERT的新手,我想在Google Colab上微调BERT基本模型。基本上我使用GPU进行设置,下载了数据并尝试调用python run_squad.py

!git clone https://github.com/google-research/bert.git

!wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
!unzip the file
!unzip uncased_L-12_H-768_A-12.zip

import tensorflow as tf

# Get the GPU device name.
device_name = tf.test.gpu_device_name()

# The device name should look like the following:
if device_name == '/device:GPU:0':
    print('Found GPU at: {}'.format(device_name))
else:
    raise SystemError('GPU device not found')

import torch

# If there's a GPU available...
if torch.cuda.is_available():    

    # Tell PyTorch to use the GPU.    
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

!pip install transformers
!pip install wget

import wget
import os

print('Downloading dataset...')

# The URL for the dataset zip file.
url = 'https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json'

# Download the file (if we haven't already)
if not os.path.exists('./train-v2.0.json'):
    wget.download(url, './train-v2.0.json')

# The URL for the dataset zip file.
url = 'https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json'

# Download the file (if we haven't already)
if not os.path.exists('./dev-v2.0.json'):
    wget.download(url, './dev-v2.0.json')

print('Done')

# Unzip the dataset (if we haven't already)
if not os.path.exists('./bert-master/'):
    !unzip bert-master.zip

!pip install tensorflow-gpu==1.15.0

上面的代码基本上确保我已设置GPU,获取所需的依赖关系并下载了Squad2.0数据。接下来是调用run_squad.py,这是我迷路的地方。这是我的文件位置。 File location on Google Colab

!export BERT_BASE_DIR=/content/uncased_L-12_H-768_A-12
!export SQUAD_DIR=/content

!python run_squad.py \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=12 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=/tmp/squad_base/

我跑了牢房,发现了错误:python3。我以为我正确配置了路径,为什么它仍然缺少bert_config.json?

WARNING:tensorflow:From /content/bert-master/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From bert-master/run_squad.py:1283: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From bert-master/run_squad.py:1127: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0519 08:02:09.023542 140027033761664 module_wrapper.py:139] From bert-master/run_squad.py:1127: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From bert-master/run_squad.py:1127: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

W0519 08:02:09.023784 140027033761664 module_wrapper.py:139] From bert-master/run_squad.py:1127: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /content/bert-master/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0519 08:02:09.024012 140027033761664 module_wrapper.py:139] From /content/bert-master/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

Traceback (most recent call last):
  File "bert-master/run_squad.py", line 1283, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "bert-master/run_squad.py", line 1129, in main
    bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
  File "/content/bert-master/modeling.py", line 94, in from_json_file
    text = reader.read()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/lib/io/file_io.py", line 122, in read
    self._preread_check()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/lib/io/file_io.py", line 84, in _preread_check
    compat.as_bytes(self.__name), 1024 * 512)
tensorflow.python.framework.errors_impl.NotFoundError: /bert_config.json; No such file or directory

0 个答案:

没有答案