Python Pandas将所有文本列显示为NaN

时间:2014-06-10 19:23:48

标签: python pandas

我希望在python脚本中读取一系列制表符分隔的文件。出于某种原因,当我导入文件时,我的所有文本列都以NaN形式返回。

输入文件的示例:

Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Period: Oct 28 2013 - Apr 27 2014
Note:
Brand Variant                               Industry                                    Major Category                              Market                                      Media Type                                  Parent Company                              Product Category                            Report Period (multiple)                    PCC Sub Group                               Subsidiary                                  Units   $$$ (000)
3 LADIES HAND-DIPPED CANDIES CANDY  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Columbus Combo  Local Newspaper     COTTAGE FOOD PRODUCTION OPERATION   CANDY   11/18/13 - 11/24/13     F211 CANDY & GUM    COTTAGE FOOD PRODUCTION OPERATION   1   0.286   
3 MUSKETEERS CANDY BAR  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Atlanta Combo   Spot Radio  MARS INC    CANDY BAR   11/04/13 - 11/10/13     F211 CANDY & GUM    MARS SNACKFOOD US LLC   22  1.403   

这是我的python片段(3.3):

df = read_csv(csvFile, delimiter='\t', header=[9])
print(df)

输出以下内容:

Brand Variant                             \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Industry                                  \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Major Category                            \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Market                                    \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Media Type                                \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Parent Company                            \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Product Category                          \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Report Period (multiple)                  \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    PCC Sub Group                             \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Subsidiary                                \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Units $$$ (000)  
3 LADIES HAND-DIPPED CANDIES CANDY    NaN       NaN  
3 MUSKETEERS CANDY BAR                NaN       NaN  

我注意到我的第一列似乎被设置为数据帧的索引,但是index_col = False只会产生一个ValueError,因为它需要一个列号。同样我尝试将dtype设置为str但没有运气。最后,在逗号分隔的另一个文件上,我能够返回包含文本数据的行。我很遗憾该怎么做......

我注意到的一件事是在字段之间更像是tab&空间。

1 个答案:

答案 0 :(得分:1)

如果您想忽略“Blah Blah”的前几行,请使用skiprows=代替header=。试试这个:

df = pd.read_csv(csvFile, sep='\t', skiprows=9, index_col=False)

原因

  

“第一列似乎被设置为数据帧的索引”

我想,

是你的文件有尾随分隔符。如果是这种情况,index_col=False应该有所帮助。见Handling of trailing delimiters in read_csv

由于我没有你的输入文件,你的复制粘贴文本显然已经破坏了标签(文本中的所有空格),我无法测试它。但请告诉我们。