Python,列表,数组,元组,数据 - 如何处理特定数据集

时间:2016-02-15 18:04:26

标签: python arrays list file-io tuples

我希望这次我能给你足够的信息来解释自己。 我试图以矢量符号读取速度数据,以便(现在)绘制一些XY散点图。 文件如下:

#               x            0.0025             0.005            0.0075              0.01             0.015              0.02              0.03              0.04              0.05              0.06              0.08               0.1              0.12              0.14              0.16              0.18               0.2
#               y                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0
#               z                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0
#            Time
           50                 (0.0007558915435 -0.0004561530839 -0.0004827045695)                 (0.002621093455 -0.0004982563588 -0.0004670886403)                 (0.004284814163 -0.0004701779131 -0.0003427572777)                 (0.005427856321 -0.0004415657508 -0.0002581055849)                 (0.009283872431 -0.0003824524669 -9.862169137e-05)                 (0.01336058599 -0.0003623751773 -3.007799017e-05)                 (0.02241437059 -0.0002222313074 0.0001136439177)                 (0.03056537385 -4.38083924e-05 0.0002682758253)                 (0.038580681 -4.613463513e-06 0.0002734791838)                 (0.04315368113 7.912822938e-05 0.0002553115381)                 (0.04920978201 0.0001259194082 0.0001679574544)                 (0.05178246176 3.113282703e-05 8.74525373e-05)                 (0.05351566041 -6.546046173e-07 5.251841968e-05)                 (0.05470950178 5.582683289e-06 5.456222367e-05)                 (0.05765609801 1.604055123e-05 5.61024635e-05)                 (0.05910960178 8.390667426e-06 5.051911761e-05)                 (0.06047027361 -3.362615186e-06 5.137448521e-05)
          100                 (-0.03638183522 -0.0004212943087 -0.0001445116086)                 (-0.04599742972 1.934674765e-05 0.0002080845418)                 (-0.0263580529 0.0007034850972 0.0007206210834)                 (-0.005878665916 0.0009878563826 0.0009139785036)                 (0.03751451082 0.0008459502289 0.0008117077564)                 (0.06155058308 0.0007058376794 0.0007077796084)                 (0.09253546972 0.0005743407599 0.0005878527131)                 (0.1056482525 0.0004776711045 0.0005015883363)                 (0.1147274675 0.0003535542095 0.0003873958082)                 (0.1197626602 0.0003578742091 0.0003643755411)                 (0.1264856441 0.0003138045371 0.0003051010097)                 (0.1307027216 0.0002453538171 0.0002362933067)                 (0.1347570923 0.000177587389 0.0001672847755)                 (0.1366348914 0.0001554091899 0.000144292499)                 (0.1398319486 0.0001272587836 0.000111811677)                 (0.141127784 0.0001160117874 9.894530615e-05)                 (0.1422487007 0.0001054244658 8.819660841e-05)
          150                 (-0.05825943888 0.0001136539473 0.0004206885026)                 (-0.04572555779 0.0007272639883 0.0005475238907)                 (0.001189305157 0.001076000002 0.0006294173999)                 (0.02934769975 0.0009229883365 0.0006037649856)                 (0.07194848666 0.0006515992717 0.0005186304839)                 (0.09490965777 0.0005256600022 0.0004767879994)                 (0.1233413075 0.0004350708279 0.0004479392071)                 (0.1347607461 0.0003609992666 0.0003952444021)                 (0.1426707096 0.0002771968784 0.0003190311903)                 (0.147209712 0.0002727655531 0.0003053133615)                 (0.1532548565 0.0002247845037 0.0002564816634)                 (0.1570851548 0.0001718066583 0.0002036570558)                 (0.1608564722 0.0001242749078 0.0001549789597)                 (0.1626047646 0.0001093818898 0.0001393982173)                 (0.1656239159 9.055609841e-05 0.0001172163492)                 (0.1668961273 8.334132321e-05 0.0001085831113)                 (0.168037179 7.648813655e-05 0.0001009290741)
... and so on down to ...
        10000

' ...'意味着有更多的数据,但我不得不削减一点点"让它变得可以理解数据由空格分隔。 我想了解哪种方法可以更好地处理这类数据,以便将其读入,绘制或以其他格式编写,保留或不保留括号。

我正在考虑将其作为列表阅读,摆脱'()'符号,并通过切片列表绘制数据。或者,我应该使用数组吗?

在这两种情况下,我应该将向量视为元组吗?还是作为名单? (在列表或数组中)或每个数字作为列表的成员,在这种情况下,绘制X,Y或Z坐标时我必须小心。

我已经写了一些代码,但我被困了。我昨晚睡了两个小时,我现在正在承担后果: - (

代码:

import glob
import numpy as np
import matplotlib.pyplot as plt

#=============================================================================#
# The header of Velocity (U) probes shows the XYZ coordinates in separate     #
# lines. To work with the center line along the wake, we may assume Y=Z=0.    #
# Thus, we are interested in the values of X, in the first line of the file.  #
# The first character of each header line is '#', and the second character is #
# the coordinate, 'x' for the first line.                                     #
# The first element of interest will be [2] of the list                       #
#=============================================================================#

inFile = glob.glob("*.inp")  # list of files in current directory for input.

for Ufile in inFile:
    print("File Opened: ", Ufile)
    fi = open(Ufile, "rb")       # openning input file for reading.

    fileroot = Ufile[0:-4]       # keeping input file root for output file
    outfile = fileroot + '.out'  # adding extension
    fo = open(outfile, "wb")     # openning output file for writing

    try:
        inHead = fi.readlines()[0]  # Read X-coordinates and transform to float
        inHead = inHead.split()
        outHead = inHead[2:]

        inData = fi.readlines()[4:]    # Read data as strings. Skipping header
        r = 0
        for line in inData:
            fila = line.split()        # Divinding each row in elements
            c = 0
            for elem in fila:
                if elem[0] == '(':     # Slicing undesired character
                    elem = elem[1:]
                    fila[c] = float(elem)  # Converting string to float
                elif elem[-1] == ')':      # Slicing undesired character
                    elem = elem[0:-1]
                    fila[c] = float(elem)  # Converting string to float
                else:
                    fila[c] = float(elem)  # Converting string to float
                c += 1        # Tracking with row element the loop is at
            inData[r] = fila  # Updating list row with '(' and ')' removed
            r += 1


    finally:
        print("File Closed: ", Ufile)
        fi.close()
        fo.close()

在此处粘贴代码时,某些缩进可能会显示错误。我展示的是它应该做的。

提前致谢。

2 个答案:

答案 0 :(得分:0)

首先,您应该使用open(Ufile, "r")而不是open(Ufile, "rb")(也是wb),因为您使用的是文本文件。 其次,inData = fi.readlines()[4:]不会读取任何内容(文件指向文件末尾,因为您之前使用过inHead = fi.readlines()[0]。您可以使用fi.seek(0)重置它。更好的是,您可以读取所有行一个var并将其用于inHead和inData。 第三,你不输出任何东西...... 您可以使用elem.rstrip('\)').lstrip('\(')代替某些代码..

答案 1 :(得分:0)

使用每行的列表(向量)可以为您提供解决方案。但是如果每一行都有一定数量的向量,那么将它们输入到一个numpy数组中将是前进的方向。

但是要处理字符串数据,以下内容应该有所帮助:

import numpy as np

#The following assumes the data is read as lines of text in the following format
txt=["           50                 (0.0007558915435 -0.0004561530839 -0.0004827045695) .... (0.06047027361 -3.362615186e-06 5.137448521e-05)",
     "          100                 (-0.03638183522 -0.0004212943087 -0.0001445116086) .... (0.1422487007 0.0001054244658 8.819660841e-05)",
     "          150                 (-0.05825943888 0.0001136539473 0.0004206885026) ....  (0.168037179 7.648813655e-05 0.0001009290741)"]
complete_list = []

for line in txt:
    line_part = line.split('(')
    header = int(line_part[0].strip(' '))  #changed from .rstrip(' ')
    vector_list = []
    for vector in line_part[1:]:
        coords = vector.split(' ')
        X = float(coords[0])
        Y = float(coords[1])
        Z = float(coords[2].rstrip(')'))
        vector_list.append([X,Y,Z])
    vector_array = np.array(vector_list)
    complete_list.append([header,vector_array])

#addressing can be done as follows:
line = 1
vector =2
print("header\n",complete_list[line][0])
print("vector\n",complete_list[line][1][vector])