Question

我有一个带有时间，事件和加速度值（x，y，z）的csv文件作为列，我想只获取事件值start和stop之间的值（在事件列中有各种单词，如start停止运动的位置）。我已经在pandas中使用了数据框，但我获得了start和stop的值，而不是它们之间的所有值。

Csv文件：

 time   event   earthAcceleration.x earthAcceleration.y earthAcceleration.z

2017-11-22T09:20:13.944 motion  -0.006380   -0.001029   -0.010781

2017-11-22T09:20:13.954 start 

2017-11-22T09:20:13.964 motion  0.008521    -0.008223   0.022574

2017-11-22T09:20:13.974 stop    

2017-11-22T09:20:13.984 motion  0.016283    0.003181    0.006969

代码：

import pandas as pd
df = pd.read_csv('nehi.csv')
df = df[df['event'].between('start', 'stop', inclusive=True)]
df

我的输出是：值=开始和停止的值

time    event   earthAccelerationx earthAccelerationy earthAccelerationz

2017-11-22T09:20:13.954 start   NaN NaN NaN

2017-11-22T09:20:13.974 stop    NaN NaN NaN

也试过

start= event[(event['event']=='start') & (event['event']=='stop')]
start.head()

但是给出空值

所需输出为：单词start和stop之间的时间和加速度值。

time       event    earthAccelerationx  earthAccelerationy earthAccelerationz

2017-11-22T09:20:13.964 motion  0.008521    -0.008223   0.022574

目标：提取列命名事件中2个关键字之间的所有列值和行值。

Answer 1

我使用'hello'和'world'来表示关键字。

import pandas as pd

df = pd.read_csv('two.txt', header=None, delimiter="hello", engine='python')
df2 = df.loc[:, 1]
values = []

for row in df2:
    print (row.index('world'))
    values.append(row[:row.index('world')])

print(values)

Answer 2

您可以先处理文件以提取所需的数据。这将检查关键字并使用标志来切换保持或忽略数据。

csv文件：

import io

s = ''' time   event   earthAcceleration.x earthAcceleration.y earthAcceleration.z
2017-11-22T09:20:13.944 motion  -0.006380   -0.001029   -0.010781
2017-11-22T09:20:13.954 start 
2017-11-22T09:20:13.964 motion  0.008521    -0.008223   0.022574
2017-11-22T09:20:13.974 stop    
2017-11-22T09:20:13.984 motion  0.016283    0.003181    0.006969
'''
#Python 2.7
f = io.BytesIO(s)
#Python 3.6
#f = io.StringIO(s)

预处理：

flag = False
data = []
header = f.next()
#or
#header = f.readline()
header = header.split()
for line in f:
   line = line.split()
   #print(line)
   if line[1] == 'start':
      flag = True
      continue
   elif line[1] == 'stop':
      flag = False
      continue
   if flag:
      data.append(line)
      #print(line)

使用实际文件，在处理时使用上下文管理器。

flag = False
data = []
with open('nehi.txt') as f:
   header = next(f)
   header = header.split()
   for line in f:
      line = line.split()
      #print(line)
      if line[1] == 'start':
         flag = True
         continue
      elif line[1] == 'stop':
         flag = False
         continue
      if flag:
         data.append(line)
         #print(line)

如果您需要DataFrame，可以将data和header提供给pandas。

df = pandas.DataFrame(data=data, columns = header)

Answer 3

试试这个

start_index = df[df['event'].str.contains('start')].index[0] + 1
stop_index = df[df['event'].str.contains('stop')].index[0] - 1
new_df = df.loc[start_index:stop_index, :]


    time                    event   earthAcceleration.x earthAcceleration.y earthAcceleration.z
2   2017-11-22T09:20:13.964 motion  0.008521            -0.008223   0.022574

编辑：这将为您提供一个数据帧列表，其中包含每个开始和停止之间的行

start_index = df[df['event'].str.contains('start')].index
stop_index = df[df['event'].str.contains('stop')].index
l_dfs = []
for i in range(len(start_index)):
    l_dfs.append(df.loc[start_index[i]+1:stop_index[i]-1, :])

在csv文件中，获取2个特定单词之间的行和列的所有值，例如start和stop

3 个答案: