我有一个数据框,其中包含有条件的行
TIME VALUE Prev_Time
0 23:01 0 NaN
1 23:02 0 NaN
2 23:03 1 23:02
3 23:04 0 NaN
4 23:05 0 NaN
5 23:06 1 23:05
6 23:07 0 NaN
7 23:08 0 NaN
8 23:09 0 NaN
9 23:10 0 NaN
10 23:11 1 23:10
11 23:12 0 NaN
12 23:13 0 NaN
13 23:14 0 NaN
14 23:15 0 NaN
15 23:16 1 23:15
16 23:17 0 NaN
我想根据Column'Prev_Time'上的条件计算行数,以便......
所需的输出应为
ROW_COUNT
0 2
1 3
2 5
3 5
4 2
我也想要Total Counts,有点像(len(df)),应该打印
Total Count: 5
答案 0 :(得分:3)
找到好的方法:
notnull=df[df.VALUE>0]
"""
TIME VALUE Prev_Time
2 23:03 1 23:02
5 23:06 1 23:05
10 23:11 1 23:10
15 23:16 1 23:15
"""
使用np.split
来打破:
row_counts=pd.DataFrame({'ROW_COUNT':[len(x) for x in np.split(df,notnull.index)]})
"""
ROW_COUNT
0 2
1 3
2 5
3 5
4 2
"""
并计算:
len(row_counts)
"""
5
"""
答案 1 :(得分:0)
这有点奏效,你可以根据自己的需要调整代码,但有点基本的想法!
#Dummy data set
df1 = pd.DataFrame({'TIME': np.arange(17), 'VALUE': np.arange(-17,0), 'Prev_time': [np.nan, np.nan,1, np.nan, np.nan,2, np.nan, np.nan, np.nan, np.nan,4, np.nan, np.nan, np.nan, np.nan,5, np.nan]})
#gets the rows that are not null and extracts their index number
df=df1[df1['Prev_time'].notnull()].reset_index()
#Checking for the case where the last row might be null,
#need to add it manually to the index
if df.loc[len(df)-1]['index'] != (len(df1)-1):
df.loc[len(df)]=[len(df1),0,0,0]
count=df['index']-df['index'].shift(1).fillna(0)
len(count)
答案 2 :(得分:0)
这可能不是一个完美的答案,应该得到你想要的东西:
import pandas as pd
#read the data
d = pd.read_csv('stackdata.txt')
#we need the last row to be identified, so give it a value
d['Prev_Time'][len(d)-1]=1
#get all the rows where Prev_Time is not null
ds = d[d.Prev_Time.notnull()]
#reset the index, you shall get an additional column named index
ds = ds.reset_index()
#get only the newly added index column
dst = ds[ds.columns[0]]
#get the diff of the series
dstr = dst.diff()
#Get the first value from the previous series and assign it.
dstr[0] = dst[0]
#Addd +1 to the last item -- result required.
dstr[len(dstr)-1] +=1
len(dstr)