我有一个带有时间,事件和加速度值(x,y,z)的csv文件作为列,我想只获取事件值start和stop之间的值(在事件列中有各种单词,如start停止运动的位置)。我已经在pandas中使用了数据框,但我获得了start和stop的值,而不是它们之间的所有值。
Csv文件:
time event earthAcceleration.x earthAcceleration.y earthAcceleration.z
2017-11-22T09:20:13.944 motion -0.006380 -0.001029 -0.010781
2017-11-22T09:20:13.954 start
2017-11-22T09:20:13.964 motion 0.008521 -0.008223 0.022574
2017-11-22T09:20:13.974 stop
2017-11-22T09:20:13.984 motion 0.016283 0.003181 0.006969
代码:
import pandas as pd
df = pd.read_csv('nehi.csv')
df = df[df['event'].between('start', 'stop', inclusive=True)]
df
我的输出是:值=开始和停止的值
time event earthAccelerationx earthAccelerationy earthAccelerationz
2017-11-22T09:20:13.954 start NaN NaN NaN
2017-11-22T09:20:13.974 stop NaN NaN NaN
也试过
start= event[(event['event']=='start') & (event['event']=='stop')]
start.head()
但是给出空值
所需输出为:单词start和stop之间的时间和加速度值。
time event earthAccelerationx earthAccelerationy earthAccelerationz
2017-11-22T09:20:13.964 motion 0.008521 -0.008223 0.022574
目标:提取列命名事件中2个关键字之间的所有列值和行值。
答案 0 :(得分:0)
我使用'hello'和'world'来表示关键字。
import pandas as pd
df = pd.read_csv('two.txt', header=None, delimiter="hello", engine='python')
df2 = df.loc[:, 1]
values = []
for row in df2:
print (row.index('world'))
values.append(row[:row.index('world')])
print(values)
答案 1 :(得分:0)
您可以先处理文件以提取所需的数据。这将检查关键字并使用标志来切换保持或忽略数据。
csv文件:
import io
s = ''' time event earthAcceleration.x earthAcceleration.y earthAcceleration.z
2017-11-22T09:20:13.944 motion -0.006380 -0.001029 -0.010781
2017-11-22T09:20:13.954 start
2017-11-22T09:20:13.964 motion 0.008521 -0.008223 0.022574
2017-11-22T09:20:13.974 stop
2017-11-22T09:20:13.984 motion 0.016283 0.003181 0.006969
'''
#Python 2.7
f = io.BytesIO(s)
#Python 3.6
#f = io.StringIO(s)
预处理:
flag = False
data = []
header = f.next()
#or
#header = f.readline()
header = header.split()
for line in f:
line = line.split()
#print(line)
if line[1] == 'start':
flag = True
continue
elif line[1] == 'stop':
flag = False
continue
if flag:
data.append(line)
#print(line)
使用实际文件,在处理时使用上下文管理器。
flag = False
data = []
with open('nehi.txt') as f:
header = next(f)
header = header.split()
for line in f:
line = line.split()
#print(line)
if line[1] == 'start':
flag = True
continue
elif line[1] == 'stop':
flag = False
continue
if flag:
data.append(line)
#print(line)
如果您需要DataFrame,可以将data
和header
提供给pandas。
df = pandas.DataFrame(data=data, columns = header)
答案 2 :(得分:0)
试试这个
start_index = df[df['event'].str.contains('start')].index[0] + 1
stop_index = df[df['event'].str.contains('stop')].index[0] - 1
new_df = df.loc[start_index:stop_index, :]
time event earthAcceleration.x earthAcceleration.y earthAcceleration.z
2 2017-11-22T09:20:13.964 motion 0.008521 -0.008223 0.022574
编辑:这将为您提供一个数据帧列表,其中包含每个开始和停止之间的行
start_index = df[df['event'].str.contains('start')].index
stop_index = df[df['event'].str.contains('stop')].index
l_dfs = []
for i in range(len(start_index)):
l_dfs.append(df.loc[start_index[i]+1:stop_index[i]-1, :])