将行附加到Pandas DataFrame

时间:2017-05-03 22:52:52

标签: python pandas numpy data-acquisition

我正在尝试从测量计算个人测量设备(PMD-1208FS)读取模拟信号,然后将其写入具有每个观察的相应时间戳的文件。我想每秒一次附加到此文件中,并进行新的观察。

PyUniversalLibrary允许我从设备中读取数据,但我一直试图弄清楚如何将信息保存到数据帧中。此example有助于从PMD读取数据,但它不提供任何数据记录示例。

以下示例接近解决此问题,但df.append(pd.DataFrame()函数未向我提供所需的结果。此函数最终将最新的数据帧附加到先前保存的数据帧的底部,而不是仅添加新数据。结果是一个数据帧按顺序有许多重复的数据帧。

这是我的代码:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()


## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    data.append(EngUnits)
    times.append(datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    co.append(ppm)
    ## This line of code is not providing the desired result:
    df = df.append(pd.DataFrame({'co':ppm, 'volts':data, 'datetime':times})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

当前输出:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.8    2017-05-03 15:57:19   1.38    
2    13.9    2017-05-03 15:57:20   1.39
3    13.8    2017-05-03 15:57:19   1.38
4    13.9    2017-05-03 15:57:20   1.39
5    14.2    2017-05-03 15:57:21   1.42

期望的输出:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.9    2017-05-03 15:57:20   1.39
2    14.2    2017-05-03 15:57:21   1.42

3 个答案:

答案 0 :(得分:2)

如果您只是想要追加,那么您就不需要使用.loc的计数器。您可以将其更改为df.loc [len(df)] = row。这将始终在DataFrame的末尾写一个新行。

此处更新了来自piRSquared代码的代码:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[len(df)] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

答案 1 :(得分:1)

每次进入while循环时,每个字段都附加一个包含列表(随时间增长)的数据框。但是,您应该添加一个数据框,其中列表一次只能为每个字段生成一个元素。请看下面的例子

你基本上是这样做的:

co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp

df = pd.DataFrame()
for i in range(0,5):
    data.append(i)
    times.append(i)
    co.append(i)
    df = df.append(pd.DataFrame({'co':co, 'volts':data, 'datetime':times}))
print df

导致

   co  datetime  volts
0   0         0      0
0   0         0      0
1   1         1      1
0   0         0      0
1   1         1      1
2   2         2      2
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
4   4         4      4

但你应该这样做

df = pd.DataFrame()
for i in range(0,5):
    df = df.append(pd.DataFrame({'co':[i], 'volts':[i], 'datetime':[i]}))
print df

导致

   co  datetime  volts
0   0         0      0
0   1         1      1
0   2         2      2
0   3         3      3
0   4         4      4

所以你的代码应该

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty dataframe to fill:
df = pd.DataFrame()

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    times = (datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    df = df.append(pd.DataFrame({'co':[ppm], 'volts':[EngUnits], 'datetime':[times]})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

答案 2 :(得分:1)

由于您没有专门使用索引,我会保留一个计数器并使用它向现有数据框添加新行。

我像这样重写while循环

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

counter = 0
while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[counter] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    counter += 1
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)