当我尝试将csv导入我的Redshift数据库时,我收到此错误
Missing newline: Unexpected character 0x75 found at location 4194303
csv本身似乎一切都很好。 stl表告诉我错误是在csv的70269行,其中包含此字符串
10:00:10,2014-07-28,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0),Not Listed,Not Listed,Not Listed,Not Listed,multiRetrieve,Not Listed,OS-Preview-logItemUsage,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,"[{""PubEndDate""=>""2013/12/31"", ""ItmId""=>""1353296053"", ""SourceType""=>""Scholarly Journals"", ""ReasonCode""=>""Free"", ""MyResearchUser""=>""246763"", ""ProjectCode""=>"""", ""PublicationCode""=>"""", ""PubStartDate""=>""2013/01/01"", ""ItmFrmt""=>""AbstractPreview"", ""Subrole""=>""AbstractPreview"", ""PaymentType""=>""Transactional"", ""UsageInfo""=>""P-1008275-154977-CUSTOMER-10000137-2950635"", ""Role""=>""AbstractPreview"", ""RetailPrice""=>0, ""EffectivePrice""=>0, ""ParentItemId""=>""53628""}]","[""optype:Online"", ""location:null"", ""target:null""]",192.234.111.8,DIALOG,20140728131712007:882391,1119643,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,Not Listed,"2014-07-28 10:00:10-0400,421 {""Items"":[{""PubEndDate"":""2013/12/31"",""ItmId"":""1353296053"",""SourceType"":""Scholarly Journals"",""ReasonCode"":""Free"",""MyResearchUser"":""246763"",""ProjectCode"":"""",""PublicationCode"":"""",""PubStartDate"":""2013/01/01"",""ItmFrmt"":""AbstractPreview"",""Subrole"":""AbstractPreview"",""PaymentType"":""Transactional"",""UsageInfo"":""P-1008275-154977-CUSTOMER-10000137-2950635"",""Role"":""AbstractPreview"",""RetailPrice"":0,""EffectivePrice"":0,""ParentItemId"":""53628""}],""Operation"":[""optype:Online"",""location:null"",""target:null""],""UserAgent"":""Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"",""UserInfo"":{""IP"":""192.234.111.8"",""AppId"":""DIALOG"",""SessId"":""20140728131712007:882391"",""UsageGroupId"":""1119643""},""UsageType"":""multiRetrieve"",""BreadCrumb"":""OS-Preview-logItemUsage""}
为什么它不会加载任何想法?
编辑:这显然与号码'4194303'有关。我的许多redshift上传都失败了,这是我的stl_load_errors
的简短示例Missing newline: Unexpected character 0x3a found at location 4194303
Missing newline: Unexpected character 0x63 found at location 4194303
Missing newline: Unexpected character 0x6c found at location 4194303
Missing newline: Unexpected character 0x22 found at location 4194303
表格中出现“text”类型错误的所有条目,大约有30列。 csv本身包含数千条记录(相当大的csv文件)。
替代方法(不是解决方案)
我发现数字4194303来自Redshift复制的TRUNCATECOLUMNS功能设置的4MB限制。通过禁用此功能,我得到“字符串长度超过DDL长度”错误(这就是我首先使用TRUNCATECOLUMNS的原因)。
所以问题是我的许多记录超过4MB,如果需要截断任何属性,redshift不支持这样的记录。
但是,通过使用copy命令的MAXERROR 1000选项,我可以忽略4MB +记录,并留下一个只包含我想要的行小于4MB的数据库。
答案 0 :(得分:0)
您可以尝试使用以下选项添加的复制命令
ACCEPTINVCHARS ESCAPE
有时,当您从mac或Windows创建CSV文件时,它们可能包含特殊字符。
答案 1 :(得分:0)
问题在于EOL(行尾)字符。我今天遇到了同样的问题,问题是我的csv有MAC EOL(可能是CR)。我把它改成了Unix(使用LF),副本就完成了。