Question

我的文本文件有3列，例如：

1  2  3
2  4  6
3  6  9

我想要这样排列：

wave  shape  freq
  1     2      3
  2     4      6
  3     6      9

我使用了以下脚本：

import glob
import pandas as pd


import_file = glob.glob('data.txt')
for files in import_file:
     intial_data = pd.read_csv(files, header=None, delimiter="\t").values
     table = pd.DataFrame(intial_data, columns = ['wave' , 'shape', 'freq')
      print(table)

它给了我

错误：传递的值的形状为（4，150），索引暗示为（3，150）

我通过添加另一个列指示符X代替了安排和描述表格的行

table = pd.DataFrame (intial_data, columns = ['wave' , 'shape', 'freq','x'])

瘦的很好，给了我这个结果

       wave     shape    freq     x
 0    1.0000   2.0000   3.0000   NaN
 1    2.0000   4.0000   6.0000   NaN
 2    3.0000   6.0000   9.0000   NaN

我无法理解NaN专栏，我需要将其从工作中删除

请提出任何建议？

Answer 1

请勿接受删除NaN的建议。那将是XY Problem的解决方案，而不是原因的解决方案。

改为使用

intial_data = pd.read_csv('data.txt', header=None, delim_whitespace=True)

或

intial_data = pd.read_csv('data.txt', header=None, sep='\s+')

出现此错误的原因是，您的data.txt文件最后有一个额外的标签。 pandas将其解释为NaN s的额外列。

所以，即使您见

1  2  3
2  4  6
3  6  9

您最有可能拥有的是

1\t2\t3\t
2\t4\t6\t
3\t6\t9\t

最后一个\t添加了额外的列。

Answer 2

要删除具有NaN值的列，请使用df.dropna（axis ='columns'）您可以在此处获取更多信息：dropna documentation

Answer 3

df.dropna(1,'all')

输出：

   wave  shape  freq
0   1.0    2.0   3.0
1   2.0    4.0   6.0
2   3.0    6.0   9.0

或者您也可以使用usecols的{{1}}参数从文件中读取前三列。使用以下代码，您可以直接获取pd.read_csv变量，而无需先阅读table：

intial_data

如何删除NaN列？

3 个答案: