用python从.txt文件中的数字数据中分离字符串

时间:2018-07-24 16:17:50

标签: python cdf

我有一个.txt文件,如下所示:

08/19/93 UW ARCHIVE           100.0  1962 W IEEE 14 Bus Test Case
BUS DATA FOLLOWS                            14 ITEMS
   1 Bus 1     HV  1  1  3 1.060    0.0      0.0      0.0    232.4   -16.9     0.0  1.060     0.0     0.0   0.0    0.0        0
   2 Bus 2     HV  1  1  2 1.045  -4.98     21.7     12.7     40.0    42.4     0.0  1.045    50.0   -40.0   0.0    0.0        0
   3 Bus 3     HV  1  1  2 1.010 -12.72     94.2     19.0      0.0    23.4     0.0  1.010    40.0     0.0   0.0    0.0        0
   4 Bus 4     HV  1  1  0 1.019 -10.33     47.8     -3.9      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
   5 Bus 5     HV  1  1  0 1.020  -8.78      7.6      1.6      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
   6 Bus 6     LV  1  1  2 1.070 -14.22     11.2      7.5      0.0    12.2     0.0  1.070    24.0    -6.0   0.0    0.0        0
   7 Bus 7     ZV  1  1  0 1.062 -13.37      0.0      0.0      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
   8 Bus 8     TV  1  1  2 1.090 -13.36      0.0      0.0      0.0    17.4     0.0  1.090    24.0    -6.0   0.0    0.0        0
   9 Bus 9     LV  1  1  0 1.056 -14.94     29.5     16.6      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.19       0
  10 Bus 10    LV  1  1  0 1.051 -15.10      9.0      5.8      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
  11 Bus 11    LV  1  1  0 1.057 -14.79      3.5      1.8      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
  12 Bus 12    LV  1  1  0 1.055 -15.07      6.1      1.6      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
  13 Bus 13    LV  1  1  0 1.050 -15.16     13.5      5.8      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
  14 Bus 14    LV  1  1  0 1.036 -16.04     14.9      5.0      0.0     0.0     0.0  0.0       0.0     0.0   0.0 

我需要从此文件中删除字符,并且只需要矩阵形式的数字数据。我是python的新手,所以非常感谢您提供任何帮助。谢谢。

2 个答案:

答案 0 :(得分:0)

我建议读取Pandas数据框中的数据,而不要删除带有文本的列,或者创建第二个没有文本列的Frame。

尝试:

data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]

答案 1 :(得分:0)

因为如果数据正确,在熊猫中执行此操作很简单,这是我的看法:

import pandas as pd

data = '''\
08/19/93 UW ARCHIVE           100.0  1962 W IEEE 14 Bus Test Case
BUS DATA FOLLOWS                            14 ITEMS
   1 Bus 1     HV  1  1  3 1.060    0.0      0.0      0.0    232.4   -16.9     0.0  1.060     0.0     0.0   0.0    0.0        0
   2 Bus 2     HV  1  1  2 1.045  -4.98     21.7     12.7     40.0    42.4     0.0  1.045    50.0   -40.0   0.0    0.0        0
   3 Bus 3     HV  1  1  2 1.010 -12.72     94.2     19.0      0.0    23.4     0.0  1.010    40.0     0.0   0.0    0.0        0
   4 Bus 4     HV  1  1  0 1.019 -10.33     47.8     -3.9      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
   5 Bus 5     HV  1  1  0 1.020  -8.78      7.6      1.6      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
   6 Bus 6     LV  1  1  2 1.070 -14.22     11.2      7.5      0.0    12.2     0.0  1.070    24.0    -6.0   0.0    0.0        0
   7 Bus 7     ZV  1  1  0 1.062 -13.37      0.0      0.0      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
   8 Bus 8     TV  1  1  2 1.090 -13.36      0.0      0.0      0.0    17.4     0.0  1.090    24.0    -6.0   0.0    0.0        0
   9 Bus 9     LV  1  1  0 1.056 -14.94     29.5     16.6      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.19       0
  10 Bus 10    LV  1  1  0 1.051 -15.10      9.0      5.8      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
  11 Bus 11    LV  1  1  0 1.057 -14.79      3.5      1.8      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
  12 Bus 12    LV  1  1  0 1.055 -15.07      6.1      1.6      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
  13 Bus 13    LV  1  1  0 1.050 -15.16     13.5      5.8      0.0     0.0     0.0  0.0       0.0     0.0   0.0    0.0        0
'''

fileobj = pd.compat.StringIO(data)
# change fileobj to filepath and sep to `\t`
df = pd.read_csv(fileobj, sep='\s+', header=None, skiprows=2)

df = df.loc[:,df.dtypes != 'object']
print(df)

返回:

    0   2   4   5   6      7      8     9     10     11    12   13     14  \
0    1   1   1   1   3  1.060   0.00   0.0   0.0  232.4 -16.9  0.0  1.060   
1    2   2   1   1   2  1.045  -4.98  21.7  12.7   40.0  42.4  0.0  1.045   
2    3   3   1   1   2  1.010 -12.72  94.2  19.0    0.0  23.4  0.0  1.010   
3    4   4   1   1   0  1.019 -10.33  47.8  -3.9    0.0   0.0  0.0  0.000   
4    5   5   1   1   0  1.020  -8.78   7.6   1.6    0.0   0.0  0.0  0.000   
5    6   6   1   1   2  1.070 -14.22  11.2   7.5    0.0  12.2  0.0  1.070   
6    7   7   1   1   0  1.062 -13.37   0.0   0.0    0.0   0.0  0.0  0.000   
7    8   8   1   1   2  1.090 -13.36   0.0   0.0    0.0  17.4  0.0  1.090   
8    9   9   1   1   0  1.056 -14.94  29.5  16.6    0.0   0.0  0.0  0.000   
9   10  10   1   1   0  1.051 -15.10   9.0   5.8    0.0   0.0  0.0  0.000   
10  11  11   1   1   0  1.057 -14.79   3.5   1.8    0.0   0.0  0.0  0.000   
11  12  12   1   1   0  1.055 -15.07   6.1   1.6    0.0   0.0  0.0  0.000   
12  13  13   1   1   0  1.050 -15.16  13.5   5.8    0.0   0.0  0.0  0.000   

      15    16   17    18  19  
0    0.0   0.0  0.0  0.00   0  
1   50.0 -40.0  0.0  0.00   0  
2   40.0   0.0  0.0  0.00   0  
3    0.0   0.0  0.0  0.00   0  
4    0.0   0.0  0.0  0.00   0  
5   24.0  -6.0  0.0  0.00   0  
6    0.0   0.0  0.0  0.00   0  
7   24.0  -6.0  0.0  0.00   0  
8    0.0   0.0  0.0  0.19   0  
9    0.0   0.0  0.0  0.00   0  
10   0.0   0.0  0.0  0.00   0  
11   0.0   0.0  0.0  0.00   0  
12   0.0   0.0  0.0  0.00   0