我已经在AWS控制台上成功设置了粘合爬虫。
现在,我有了一个Cloudformation模板来模拟整个过程,除了无法将Exclusions:字段添加到模板中。背景:从AWS Glue API中,Exclusions:
字段表示全局模式,以排除与数据存储(在我的示例中为S3数据存储)中的特定模式匹配的文件或文件夹。
尽管脚本中所有其他值都与爬网程序配置一起填充,例如,S3Target,爬网程序名称,IAM角色和分组行为以及所有这些粘胶设置,但我花了很大的力气仍无法在glob爬虫控制台上填充glob模式/ fields从CFN模板成功填充,除了Exclusions字段(在Glue Console上也称为排除模式)以外的所有字段。我的CFN模板通过了验证,我运行了搜寻器,希望尽管隐藏的排除glob仍然会以某种方式产生影响,但是不幸的是我似乎无法填充“排除”字段?
Here's the S3Target Exclusion AWS Glue API guide
Here's an AWS sample YAML CFN for a Glue Crawler
Here's a helpful YAML string array guide
YAML
CFNCrawlerSecDeraNUM:
Type: AWS::Glue::Crawler
Properties:
Name: !Ref CFNCrawlerName
Role: !GetAtt CFNRoleSecDERA.Arn
#Classifiers: none, use the default classifier
Description: AWS Glue crawler to crawl SecDERA data
#Schedule: none, use default run-on-demand
DatabaseName: !Ref CFNDatabaseName
Targets:
S3Targets:
- Exclusions:
- "*/readme.htm"
- "*/sub.txt"
- "*/pre.txt"
- "*/tag.txt"
- Path: "s3://sec-input"
TablePrefix: !Ref CFNTablePrefixName
SchemaChangePolicy:
UpdateBehavior: "UPDATE_IN_DATABASE"
DeleteBehavior: "LOG"
# Added single schema grouping Glue API option
Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"
JSON
"CFNCrawlerSecDeraNUM": {
"Type": "AWS::Glue::Crawler",
"Properties": {
"Name": {
"Ref": "CFNCrawlerName"
},
"Role": {
"Fn::GetAtt": [
"CFNRoleSecDERA",
"Arn"
]
},
"Description": "AWS Glue crawler to crawl SecDERA data",
"DatabaseName": {
"Ref": "CFNDatabaseName"
},
"Targets": {
"S3Targets": [
{
"Exclusions": [
"*/readme.htm",
"*/sub.txt",
"*/pre.txt",
"*/tag.txt"
]
},
{
"Path": "s3://sec-input"
}
]
},
"TablePrefix": {
"Ref": "CFNTablePrefixName"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "LOG"
},
"Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"
}
}
答案 0 :(得分:1)
您要将# -*- coding: utf-8 -*-
# Copyright (c) Muhammet Emin TURGUT 2020
# For license see LICENSE
from tkinter import *
from tkinter import ttk
class ScrollableNotebook(ttk.Frame):
def __init__(self,parent,*args,**kwargs):
ttk.Frame.__init__(self, parent, *args)
self.xLocation = 0
self.notebookContent = ttk.Notebook(self,**kwargs)
self.notebookContent.pack(fill="both", expand=True)
self.notebookTab = ttk.Notebook(self,**kwargs)
self.notebookTab.bind("<<NotebookTabChanged>>",self._tabChanger)
slideFrame = ttk.Frame(self)
slideFrame.place(relx=1.0, x=0, y=1, anchor=NE)
leftArrow = ttk.Label(slideFrame, text="\u25c0")
leftArrow.bind("<1>",self._leftSlide)
leftArrow.pack(side=LEFT)
rightArrow = ttk.Label(slideFrame, text=" \u25b6")
rightArrow.bind("<1>",self._rightSlide)
rightArrow.pack(side=RIGHT)
self.notebookContent.bind( "<Configure>", self._resetSlide)
def _tabChanger(self,event):
self.notebookContent.select(self.notebookTab.index("current"))
def _rightSlide(self,event):
if self.notebookTab.winfo_width()>self.notebookContent.winfo_width()-30:
if (self.notebookContent.winfo_width()-(self.notebookTab.winfo_width()+self.notebookTab.winfo_x()))<=35:
self.xLocation-=20
self.notebookTab.place(x=self.xLocation,y=0)
def _leftSlide(self,event):
if not self.notebookTab.winfo_x()== 0:
self.xLocation+=20
self.notebookTab.place(x=self.xLocation,y=0)
def _resetSlide(self,event):
self.notebookTab.place(x=0,y=0)
self.xLocation = 0
def add(self,frame,**kwargs):
if len(self.notebookTab.winfo_children())!=0:
self.notebookContent.add(frame, text="",state="hidden")
else:
self.notebookContent.add(frame, text="")
self.notebookTab.add(ttk.Frame(self.notebookTab),**kwargs)
def forget(self,tab_id):
self.notebookContent.forget(tab_id)
self.notebookTab.forget(tab_id)
def hide(self,tab_id):
self.notebookContent.hide(tab_id)
self.notebookTab.hide(tab_id)
def identify(self,x, y):
return self.notebookTab.identify(x,y)
def index(self,tab_id):
return self.notebookTab.index(tab_id)
def insert(self,pos,frame, **kwargs):
self.notebookContent.insert(pos,frame, **kwargs)
self.notebookTab.insert(pos,frame,**kwargs)
def select(self,tab_id):
self.notebookContent.select(tab_id)
self.notebookTab.select(tab_id)
def tab(self,tab_id, option=None, **kwargs):
return self.notebookTab.tab(tab_id, option=None, **kwargs)
def tabs(self):
return self.notebookContent.tabs()
def enable_traversal(self):
self.notebookContent.enable_traversal()
self.notebookTab.enable_traversal()
作为新的Exclusions
对象传递到S3Target
列表。
尝试更改此内容:
S3Targets
对此:
Targets:
S3Targets:
- Exclusions:
- "*/readme.htm"
- "*/sub.txt"
- "*/pre.txt"
- "*/tag.txt"
- Path: "s3://sec-input"