在csv文件中,获取2个特定单词之间的行和列的所有值,例如start和stop

时间:2017-12-07 19:25:28

标签: python pandas

我有一个带有时间,事件和加速度值(x,y,z)的csv文件作为列,我想只获取事件值start和stop之间的值(在事件列中有各种单词,如start停止运动的位置)。我已经在pandas中使用了数据框,但我获得了start和stop的值,而不是它们之间的所有值。

Csv文件:

 time   event   earthAcceleration.x earthAcceleration.y earthAcceleration.z

2017-11-22T09:20:13.944 motion  -0.006380   -0.001029   -0.010781

2017-11-22T09:20:13.954 start 

2017-11-22T09:20:13.964 motion  0.008521    -0.008223   0.022574

2017-11-22T09:20:13.974 stop    

2017-11-22T09:20:13.984 motion  0.016283    0.003181    0.006969

代码:

import pandas as pd
df = pd.read_csv('nehi.csv')
df = df[df['event'].between('start', 'stop', inclusive=True)]
df

我的输出是:值=开始和停止的值

time    event   earthAccelerationx earthAccelerationy earthAccelerationz

2017-11-22T09:20:13.954 start   NaN NaN NaN

2017-11-22T09:20:13.974 stop    NaN NaN NaN

也试过

start= event[(event['event']=='start') & (event['event']=='stop')]
start.head()

但是给出空值

所需输出为:单词start和stop之间的时间和加速度值。

time       event    earthAccelerationx  earthAccelerationy earthAccelerationz

2017-11-22T09:20:13.964 motion  0.008521    -0.008223   0.022574

目标:提取列命名事件中2个关键字之间的所有列值和行值。

3 个答案:

答案 0 :(得分:0)

我使用'hello'和'world'来表示关键字。

import pandas as pd

df = pd.read_csv('two.txt', header=None, delimiter="hello", engine='python')
df2 = df.loc[:, 1]
values = []

for row in df2:
    print (row.index('world'))
    values.append(row[:row.index('world')])

print(values)

答案 1 :(得分:0)

您可以先处理文件以提取所需的数据。这将检查关键字并使用标志来切换保持或忽略数据。

csv文件:

import io

s = ''' time   event   earthAcceleration.x earthAcceleration.y earthAcceleration.z
2017-11-22T09:20:13.944 motion  -0.006380   -0.001029   -0.010781
2017-11-22T09:20:13.954 start 
2017-11-22T09:20:13.964 motion  0.008521    -0.008223   0.022574
2017-11-22T09:20:13.974 stop    
2017-11-22T09:20:13.984 motion  0.016283    0.003181    0.006969
'''
#Python 2.7
f = io.BytesIO(s)
#Python 3.6
#f = io.StringIO(s)

预处理:

flag = False
data = []
header = f.next()
#or
#header = f.readline()
header = header.split()
for line in f:
   line = line.split()
   #print(line)
   if line[1] == 'start':
      flag = True
      continue
   elif line[1] == 'stop':
      flag = False
      continue
   if flag:
      data.append(line)
      #print(line)

使用实际文件,在处理时使用上下文管理器。

flag = False
data = []
with open('nehi.txt') as f:
   header = next(f)
   header = header.split()
   for line in f:
      line = line.split()
      #print(line)
      if line[1] == 'start':
         flag = True
         continue
      elif line[1] == 'stop':
         flag = False
         continue
      if flag:
         data.append(line)
         #print(line)

如果您需要DataFrame,可以将dataheader提供给pandas。

df = pandas.DataFrame(data=data, columns = header)

答案 2 :(得分:0)

试试这个

start_index = df[df['event'].str.contains('start')].index[0] + 1
stop_index = df[df['event'].str.contains('stop')].index[0] - 1
new_df = df.loc[start_index:stop_index, :]


    time                    event   earthAcceleration.x earthAcceleration.y earthAcceleration.z
2   2017-11-22T09:20:13.964 motion  0.008521            -0.008223   0.022574

编辑:这将为您提供一个数据帧列表,其中包含每个开始和停止之间的行

start_index = df[df['event'].str.contains('start')].index
stop_index = df[df['event'].str.contains('stop')].index
l_dfs = []
for i in range(len(start_index)):
    l_dfs.append(df.loc[start_index[i]+1:stop_index[i]-1, :])