Question

我想在文本文件中删除带有字符串或空行的行。看起来像这样。正如您所看到的那样，标题会在文件中重复出现。具有数据的行数与每个块不同。我需要它作为一个数组导入numpy。起初我用逗号表示小数点，至少我能够改变它。

我试过了，但它根本不起作用：

from types import StringType

z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r')
for line in z.readlines():
    for x in z:
        if type(z.readline(x)) is StringType:
            print line


z.close()

数据示例：

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

Answer 1

Python会首先将所有文件元素作为字符串读取，除非你强制转换它们，所以你的方法不起作用。

您最好的选择可能是使用正则表达式来过滤掉包含非数据字符的行。

f = open("datafile")
for line in f:
  #Catch everything that has a non-number/space in it
  if re.search("[^-0-9.\s]",line): 
     continue
  # Catch empty lines
  if len(line.strip()) == 0:
     continue
  # Keep the rest
  print(line)

f.close()

Answer 2

为什么不使用numpy.loadtxt？对于这些情况，它有一个非常好的界面。
请参阅documentation here

yourArry = np.loadtxt(open('yourfilename.txt', skiprows=7)

此外，由于你有heder（它应该是一个可以在文件顶部找到的标题），你可以将文件分成多个文件。您可以使用Python执行此操作，也可以使用UNIX命令csplit。怎么做，你会得到什么：

oz123@:~/tmp> csplit -k data.txt   '/^bla/' '{*}'
0
787
786
oz123@:~/tmp> ls xx
xx00  xx01  xx02
oz123@:~/tmp> ls xx00
xx00
oz123@:~/tmp> cat xx00
oz123@:~/tmp> cat xx01
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

oz123@:~/tmp> cat xx02
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

从文本文件中删除字符串保持浮动

2 个答案: