我读过一篇解释滑动窗口如何工作的文章,但找不到有关其实际实现方式的任何信息。
据我了解,如果输入太长,可以使用滑动窗口来处理文本。
如果我错了,请纠正我。 假设我有一条文字 “ Kaggle于2017年6月宣布已超过100万注册用户”。 。
给定一些stride
和max_len
,可以将输入分成重叠单词(不考虑填充)的块。
In June 2017 Kaggle announced that # chunk 1
announced that it passed 1 million # chunk 2
1 million registered users # chunk 3
如果我的问题是 “ Kaggle何时发布公告” 和 “有多少注册用户” 我可以在模型中使用chunk 1
和chunk 3
,并且完全不使用 chunk 2
。不确定我是否仍然应该使用chunk 2
来训练模型
因此输入将是:
[CLS]when did Kaggle make the announcement[SEP]In June 2017 Kaggle announced that[SEP]
和
[CLS]how many registered users[SEP]1 million registered users[SEP]
然后,如果我有一个没有答案的问题,我是否将其像所有块一样输入模型,并以 -1 指示开始和结束索引?例如 “猪会飞吗?”
[CLS]can pigs fly[SEP]In June 2017 Kaggle announced that[SEP]
[CLS]can pigs fly[SEP]announced that it passed 1 million[SEP]
[CLS]can pigs fly[SEP]1 million registered users[SEP]
如评论中所建议,II试图运行squad_convert_example_to_features
(source code)来调查我上面遇到的问题,但它似乎没有用,也没有任何文档。似乎run_squad.py
从squad_convert_example_to_features
中的s
到example
中的from transformers.data.processors.squad import SquadResult, SquadV1Processor, SquadV2Processor, squad_convert_example_to_features
from transformers import AutoTokenizer, AutoConfig, squad_convert_examples_to_features
FILE_DIR = "."
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
processor = SquadV2Processor()
examples = processor.get_train_examples(FILE_DIR)
features = squad_convert_example_to_features(
example=examples[0],
max_seq_length=384,
doc_stride=128,
max_query_length=64,
is_training=True,
)
。
100%|██████████| 1/1 [00:00<00:00, 159.95it/s]
Traceback (most recent call last):
File "<input>", line 25, in <module>
sub_tokens = tokenizer.tokenize(token)
NameError: name 'tokenizer' is not defined
我得到了错误。
tokenizers
该错误表示没有tokenizer
,但不允许我们传递squad_convert_example_to_features
。虽然如果在调试模式下在函数内部时添加令牌生成器确实可以工作。那么我该如何使用$fromaddress = “********@******.com”
$toaddress = “********@******.com”
$Subject = "Test message"
$body = "Please find attached - test"
$attachment = "C:\Users\@@@@@\Desktop\DashboardTracking.txt"
$smtpserver = "smtp.office365.com"
$message = new-object System.Net.Mail.MailMessage
$message.From = $fromaddress
$message.To.Add($toaddress)
$message.IsBodyHtml = $True
$message.Subject = $Subject
$attach = new-object Net.Mail.Attachment($attachment)
$message.Attachments.Add($attach)
$message.body = $body
$smtp = new-object Net.Mail.SmtpClient($smtpserver)
$smtp.Credentials = New-Object System.Net.NetworkCredential(“*******@*******.com”,“@@@@@@@@”);
$smtp.Send($message)
函数呢?
答案 0 :(得分:0)
我认为您选择的示例存在问题。 squad_convert_examples_to_features和squad_convert_example_to_features均具有滑动窗口方法,因为squad_convert_examples_to_features
只是squad_convert_example_to_features
的并行化包装器。但是,让我们看一下单个示例函数。首先,您需要调用squad_convert_example_to_features_init来使令牌生成器全局化(在squad_convert_examples_to_features
中自动为您完成):
from transformers.data.processors.squad import SquadResult, SquadV1Processor, SquadV2Processor, squad_convert_examples_to_features, squad_convert_example_to_features_init
from transformers import AutoTokenizer, AutoConfig, squad_convert_examples_to_features
FILE_DIR = "."
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
squad_convert_example_to_features_init(tokenizer)
processor = SquadV2Processor()
examples = processor.get_train_examples(FILE_DIR)
features = squad_convert_example_to_features(
example=examples[0],
max_seq_length=384,
doc_stride=128,
max_query_length=64,
is_training=True,
)
print(len(features))
输出:
1
您可能会说此函数未使用滑动窗口方法,但这是错误的,因为您的示例不需要拆分:
print(len(examples[0].question_text.split()) + len(examples[0].doc_tokens))
输出:
115
,它小于您设置为384的max_seq_length。现在让我们尝试另一个:
print(len(examples[129603].question_text.split()) + len(examples[129603].doc_tokens))
features = squad_convert_example_to_features(
example=examples[129603],
max_seq_length=384,
doc_stride=128,
max_query_length=64,
is_training=True,
)
print(len(features))
输出:
454
3
您现在可以将其与原始样本进行比较:
print('[CLS]' + examples[129603].question_text + '[SEP]' + ' '.join(examples[129603].doc_tokens) + '[SEP]')
for idx, f in enumerate(features):
print('Split {}'.format(idx))
print(' '.join(f.tokens))
输出:
[CLS]How often is hunting occurring in Delaware each year?[SEP]There is a very active tradition of hunting of small to medium-sized wild game in Trinidad and Tobago. Hunting is carried out with firearms, and aided by the use of hounds, with the illegal use of trap guns, trap cages and snare nets. With approximately 12,000 sport hunters applying for hunting licences in recent years (in a very small country of about the size of the state of Delaware at about 5128 square kilometers and 1.3 million inhabitants), there is some concern that the practice might not be sustainable. In addition there are at present no bag limits and the open season is comparatively very long (5 months - October to February inclusive). As such hunting pressure from legal hunters is very high. Added to that, there is a thriving and very lucrative black market for poached wild game (sold and enthusiastically purchased as expensive luxury delicacies) and the numbers of commercial poachers in operation is unknown but presumed to be fairly high. As a result, the populations of the five major mammalian game species (red-rumped agouti, lowland paca, nine-banded armadillo, collared peccary, and red brocket deer) are thought to be quite low (although scientifically conducted population studies are only just recently being conducted as of 2013). It appears that the red brocket deer population has been extirpated on Tobago as a result of over-hunting. Various herons, ducks, doves, the green iguana, the gold tegu, the spectacled caiman and the common opossum are also commonly hunted and poached. There is also some poaching of 'fully protected species', including red howler monkeys and capuchin monkeys, southern tamanduas, Brazilian porcupines, yellow-footed tortoises, Trinidad piping guans and even one of the national birds, the scarlet ibis. Legal hunters pay very small fees to obtain hunting licences and undergo no official basic conservation biology or hunting-ethics training. There is presumed to be relatively very little subsistence hunting in the country (with most hunting for either sport or commercial profit). The local wildlife management authority is under-staffed and under-funded, and as such very little in the way of enforcement is done to uphold existing wildlife management laws, with hunting occurring both in and out of season, and even in wildlife sanctuaries. There is some indication that the government is beginning to take the issue of wildlife management more seriously, with well drafted legislation being brought before Parliament in 2015. It remains to be seen if the drafted legislation will be fully adopted and financially supported by the current and future governments, and if the general populace will move towards a greater awareness of the importance of wildlife conservation and change the culture of wanton consumption to one of sustainable management.[SEP]
Split 0
[CLS] how often is hunting occurring in delaware each year ? [SEP] there is a very active tradition of hunting of small to medium - sized wild game in trinidad and tobago . hunting is carried out with firearms , and aided by the use of hounds , with the illegal use of trap guns , trap cages and s ##nare nets . with approximately 12 , 000 sport hunters applying for hunting licence ##s in recent years ( in a very small country of about the size of the state of delaware at about 512 ##8 square kilometers and 1 . 3 million inhabitants ) , there is some concern that the practice might not be sustainable . in addition there are at present no bag limits and the open season is comparatively very long ( 5 months - october to february inclusive ) . as such hunting pressure from legal hunters is very high . added to that , there is a thriving and very lucrative black market for po ##ache ##d wild game ( sold and enthusiastically purchased as expensive luxury del ##ica ##cies ) and the numbers of commercial po ##ache ##rs in operation is unknown but presumed to be fairly high . as a result , the populations of the five major mammalian game species ( red - rum ##ped ago ##uti , lowland pac ##a , nine - banded arm ##adi ##llo , collar ##ed pe ##cca ##ry , and red brock ##et deer ) are thought to be quite low ( although scientific ##ally conducted population studies are only just recently being conducted as of 2013 ) . it appears that the red brock ##et deer population has been ex ##ti ##rp ##ated on tobago as a result of over - hunting . various heron ##s , ducks , dove ##s , the green i ##gua ##na , the gold te ##gu , the spectacle ##d cai ##man and the common op ##oss ##um are also commonly hunted and po ##ache ##d . there is also some po ##achi ##ng of ' fully protected species ' , including red howl ##er monkeys and cap ##uchi ##n monkeys , southern tam ##and ##ua ##s , brazilian por ##cup ##ines , yellow - footed tor ##to ##ises , [SEP]
Split 1
[CLS] how often is hunting occurring in delaware each year ? [SEP] october to february inclusive ) . as such hunting pressure from legal hunters is very high . added to that , there is a thriving and very lucrative black market for po ##ache ##d wild game ( sold and enthusiastically purchased as expensive luxury del ##ica ##cies ) and the numbers of commercial po ##ache ##rs in operation is unknown but presumed to be fairly high . as a result , the populations of the five major mammalian game species ( red - rum ##ped ago ##uti , lowland pac ##a , nine - banded arm ##adi ##llo , collar ##ed pe ##cca ##ry , and red brock ##et deer ) are thought to be quite low ( although scientific ##ally conducted population studies are only just recently being conducted as of 2013 ) . it appears that the red brock ##et deer population has been ex ##ti ##rp ##ated on tobago as a result of over - hunting . various heron ##s , ducks , dove ##s , the green i ##gua ##na , the gold te ##gu , the spectacle ##d cai ##man and the common op ##oss ##um are also commonly hunted and po ##ache ##d . there is also some po ##achi ##ng of ' fully protected species ' , including red howl ##er monkeys and cap ##uchi ##n monkeys , southern tam ##and ##ua ##s , brazilian por ##cup ##ines , yellow - footed tor ##to ##ises , trinidad pip ##ing gu ##ans and even one of the national birds , the scarlet ib ##is . legal hunters pay very small fees to obtain hunting licence ##s and undergo no official basic conservation biology or hunting - ethics training . there is presumed to be relatively very little subsistence hunting in the country ( with most hunting for either sport or commercial profit ) . the local wildlife management authority is under - staffed and under - funded , and as such very little in the way of enforcement is done to uphold existing wildlife management laws , with hunting occurring both in and out of season , and even in wildlife san ##ct ##uaries . there is some indication that the government is beginning to [SEP]
Split 2
[CLS] how often is hunting occurring in delaware each year ? [SEP] being conducted as of 2013 ) . it appears that the red brock ##et deer population has been ex ##ti ##rp ##ated on tobago as a result of over - hunting . various heron ##s , ducks , dove ##s , the green i ##gua ##na , the gold te ##gu , the spectacle ##d cai ##man and the common op ##oss ##um are also commonly hunted and po ##ache ##d . there is also some po ##achi ##ng of ' fully protected species ' , including red howl ##er monkeys and cap ##uchi ##n monkeys , southern tam ##and ##ua ##s , brazilian por ##cup ##ines , yellow - footed tor ##to ##ises , trinidad pip ##ing gu ##ans and even one of the national birds , the scarlet ib ##is . legal hunters pay very small fees to obtain hunting licence ##s and undergo no official basic conservation biology or hunting - ethics training . there is presumed to be relatively very little subsistence hunting in the country ( with most hunting for either sport or commercial profit ) . the local wildlife management authority is under - staffed and under - funded , and as such very little in the way of enforcement is done to uphold existing wildlife management laws , with hunting occurring both in and out of season , and even in wildlife san ##ct ##uaries . there is some indication that the government is beginning to take the issue of wildlife management more seriously , with well drafted legislation being brought before parliament in 2015 . it remains to be seen if the drafted legislation will be fully adopted and financially supported by the current and future governments , and if the general populace will move towards a greater awareness of the importance of wildlife conservation and change the culture of want ##on consumption to one of sustainable management . [SEP]
如果我的问题是“ Kaggle何时发布公告”和“如何 许多注册用户”我可以使用块1和块3而不使用块 该模型中的2个。不安静地确定我是否仍应使用块2 训练模型
是的,您还应该使用块2来训练模型,因为当您尝试预测相同的序列时,您希望模型将0:0作为块2的答案跨度(即,您可以轻松地选择包含答案)。