应用错误收集

我正在尝试创建AWS Glue ETL Job，它会将存储在S3中的镶木地板文件中的数据加载到Redshift表中。 Parquet文件是用大熊猫写的，简单的＆＃39;文件架构选项到S3中的多个文件夹中。布局如下所示：

s3：//bucket/parquet_table/01/file_1.parquet

s3：//bucket/parquet_table/01/file_2.parquet

S3：//bucket/parquet_table/01/file_3.parquet

s3：//bucket/parquet_table/01/file_1.parquet

S3：//bucket/parquet_table/02/file_2.parquet

S3：//bucket/parquet_table/02/file_3.parquet

我可以使用AWS Glue Crawler在AWS Glue目录中创建一个表，并且可以从Athena查询该表，但是当我尝试创建将同一个表复制到Redshift的ETL作业时，它不起作用。

如果我抓取单个文件或抓取一个文件夹中的多个文件，它会起作用，只要涉及多个文件夹，我就会收到上述错误

import { Component, OnInit } from '@angular/core'; import { DataService } from '../../../shared/services/data.service'; import { Router } from '@angular/router'; @Component({ selector: 'app-dummy', templateUrl: './dummy.component.html', styleUrls: ['./dummy.component.scss'] }) export class DummyComponent implements OnInit { loading= false; users= []; pinId; name: string; constructor(private router: Router, private dataService: DataService) { } ngOnInit() {} submit(name) { this.loading = true; this.dataService.getUsers().subscribe(data => { this.users = data; console.log(this.users); this.users.forEach(element => { console.log(element); if (element.name === this.name) { this.pinId = element.pinId; console.log(this.pinId); setTimeout(function() { console.log(this.pinId); if (this.pinId) { this.loading = false; this.router.navigate(['schedule']); } }, 2000); } }); }, error => { console.log('error'); }); } }

如果不是简单的话，就会出现类似的问题＆＃39;我使用的模式＆＃39; hive＆＃39;。然后我们有多个文件夹，还有空的镶木地板文件抛出

<mat-card> <div class="example-container"> <mat-form-field color="accent"> <input matInput placeholder="Input" [(ngModel)]="name"> </mat-form-field> <button mat-raised-button color="accent" (click)="submit(name)">Submit</button> <mat-spinner *ngIf="loading" color="accent"></mat-spinner> </div> </mat-card>

在使用AWS Glue（ETL和数据目录）时，是否有关于如何读取Parquet文件并在S3中构建它们的建议？

Analysis Glue ETL作业因AnalysisException失败：u＆＃39;无法推断Parquet的架构。必须手动指定。＆＃39;

2 个答案: