Question

我正在尝试将csv中保存的GCS文件的数据加载到BigQuery。 csv文件采用UTF-8格式，包含7列。我在数据方案中指定了这些列（所有字符串和可空），我已经检查了csv文件的内容，看起来很好。

当我尝试加载数据时，我收到以下错误：

遇到太多错误。（错误代码：无效） gs：//gvk_test_bucket/sku_category.csv：CSV表引用列位置1，但从位置开始的行：1750384仅包含1 列。（错误代码：无效）

奇怪的是该文件只包含680228行。

当我检查正在生成表的allow jagged lines选项时，只有第一列填充了整个逗号分隔的字符串。

有人可以帮助我吗？

示例行

119470，时尚，时尚，男装，男装品牌其他，正式衬衫，长袖衬衫

Answer 1

如果没有分隔符，您的文件中不能有空行，否则BigQuery（以及几乎所有其他摄取引擎）都会认为它只是一列。

例如，第3行会因为您描述的错误而失败：

119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts,Long Sleeve Shirts

119471,Fashion,Fashion Own,Womenswear,Womensswear Brands Other,Formal Shirts,Long Sleeve Shirts

这将成功：

119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts,Long Sleeve Shirts
,,,,,,,    
119471,Fashion,Fashion Own,Womenswear,Womensswear Brands Other,Formal Shirts,Long Sleeve Shirts

Answer 2

对我来说，这是一个存在新行和回车字符的问题，请尝试替换特殊字符。我已经使用下面的代码替换了字符，它解决了加载部分。

df= df.applymap(lambda x: x.replace("\r"," "))
df= df.applymap(lambda x: x.replace("\n"," "))

我使用过lambda函数，因为我不知道在我的情况下哪个列是字符串。如果您确定列，则替换其列。

尝试替换字符，它也适用于你。

Answer 3

你要么有一个空行

119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts

119472,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts

或带引号的行

119470,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts
"119471,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts"
119472,Fashion,Fashion Own,Menswear,Menswear Brands Other,Formal Shirts

我认为BigQuery响应中存在一个错误。错误中的行号实际上是错误前的字符数。

Answer 4

就我而言，由于最后一行数据后有一个额外的空行，我遇到了这个问题。尝试删除多余的行，它应该可以工作。

从Google云端存储加载csv文件时出现BigQuery错误

4 个答案: