Question

我是这个网站的新手，甚至是Python3。我在Python3中实现了一些业余爱好。我从不同节点获取一些关于覆盆子pi的数据，我将其以列表格式存储为文本文件中的[＆＃39;时间＆＃39;，Id，值，Id，值，...]。我想使用pandas以csv格式转换此数据。转换时对于csv，在pandas数据帧中，index是列表中的时间数据，列是列表中的Id数据，值是针对[Time，Id]存储在单元格中的。因此，具有Time的一行可以在不同的Id列下具有多个值。我已经编写了这段代码来实现它。

strtime = []
with open(filename, 'r') as feed:
   # loop through the lines
    for Line in feed:
        strtime.append(Line.split()[0][1:-1]) # capture the time
strtime = list(set(strtime))
strtime.sort()

df = pd.DataFrame(np.full((len(strtime), len(Id)), np.nan), columns = Ids, index = strtime)
with open(txtfile, 'r') as feed:
   # loop through the lines
    for Line in feed:
        # find which row to fill
        for jj in range(0, len(df.index)):
            if Line.split()[0][1:-1] == df.index[jj]:
                break              
        # j is the row number that needs to be filled 
        LocalCount = 0
        #find which column (s) to fill       
        for x in range(1, len(Line.split())): #get only the IDs
            if x % 2:
                Sig_ID = Line.split()[x][0:-1]
            else:
                Sig_val = Line.split()[x][0:-1]
            LocalCount+=1
            if LocalCount == 2:
                LocalCount = 0                
                #get id name from ID
                tempVal=int(float(Id))
                df.iloc[jj, tempVal] = value

代码似乎工作正常。它产生这样的输出：

             Id1    Id2 Id3 Id4 Id5
'15_38_20'  13.375  0           
'15_38_21'  13.375  0           
'15_38_22'  13.5                
'15_38_23'  13.5    0   0   0   
'15_38_24'  13.5    0           
'15_38_25'  13.5    0   0   0   
'15_38_26'  13.5    0           
'15_38_27'  13.375  0           
'15_38_28'  13.5    0           
'15_38_29'  13.5    0   0   0   
'15_38_30'  13.5    0

但是如果txt文件的大小变大，代码似乎会慢慢生成csv。我想加快这个过程。有什么方法可以加快这个过程吗？

Answer 1

您的代码很慢，因为当您使用for循环执行所有操作时，Pandas可以对这些操作进行矢量化。最好的例子就是这段代码：

    # find which row to fill
    for jj in range(0, len(df.index)):
        if Line.split()[0][1:-1] == df.index[jj]:
            break
    # j is the row number that needs to be filled

那很慢。这很快：

    row = df.loc[Line.split(1)[0][1:-1]]

我们只需要将Line分成最多两个部分（所以split(1)），这样可以节省分配和垃圾回收。我们只进行一次拆分，而不是for循环。最后，我们使用Pandas索引直接查找值，而不是线性搜索。

Pandas在文本文件中阅读和排列数据

1 个答案: