Question

我想将csv文件读入Pandas DataFrame，并且文件包含几行，分隔符的数量不正确。我知道可以通过设置error_bad_line=False来跳过这些行。但我想以这种方式阅读它们：

更正数据：some text,label，在这种情况下为1st column = some text，2nd column = label
数据不正确：some text, another text, again some text,label，在这种情况下，我要1st column = some text, another text, again some text，2nd column = label

是否可以使用熊猫以这种方式处理不正确的数据？

Answer 1

您可以仅拆分列，例如：

df['some text, another text, again some text'] = (df['some text'] + df['another text'] + df['again some text'])

print(df)

这将改变它：

some text  another text  again some text  label;;;;;  \

0     ;;;;;           NaN              NaN         NaN   
1     ;;;;;           NaN              NaN         NaN   
2     ;;;;;           NaN              NaN         NaN   
3     ;;;;;           NaN              NaN         NaN   
4     ;;;;;           NaN              NaN         NaN   
5     ;;;;;           NaN              NaN         NaN   
6     ;;;;;           NaN              NaN         NaN

对此：

  some text, another text, again some text  
0                                      NaN  
1                                      NaN  
2                                      NaN  
3                                      NaN  
4                                      NaN  
5                                      NaN  
6                                      NaN

我没有填写行，所以有些随机； NaN出现，但它是分开的！

有人对此有更好的方法吗？谢谢

熊猫读取的分隔符数量错误

1 个答案: