这是我开始的link的转贴,但我已经意识到问题要复杂得多。
df = pd.DataFrame({'a': ['A1', 'A1', 'A1', 'A2', 'A2','A3','A3', 'A4', 'A3', 'A2', "A4", "A4", "A4"],
'value': ["7:00","10:00","20:00","9:00","7:00","9:00","8:00","15:00","19:00", "9:30", "15:30", "16:00", "16:30"],
"value2": [3,1,2,4,2,3,3,5,3,2,1,5,7],
'value3': ["Apple", "Orange", "Apple", "Kiwi", "Orange", "Orange", "Apple", "Apple", "Apple", "Apple", "Orange", "Orange","Apple"],
"value4": ["Throw", "Eat", 'Throw', "Keep", "Eat", "Eat", "Throw", "Throw", "Throw", "Throw", "Eat", "Eat", "Chuck"]})
我想要的是:1)通过ID(变量“ a”),在“ value3”下选择所有实例,其中“ value3”是“ orange”,然后是“ apple”。他们不必背对背。两者之间可以有许多其他值。但是橙色必须在苹果之前及时出现。
2)然后将橙色的这些实例与苹果的数量分为两组:1)一个是当value2等于1时表示橙色; 2)是橙色不等于1时(因此将其余部分分组)。 问题是A4,其中有两个橘子-1和5。这应该归类到value2 = 1组,因为它是最先发生的。
更新:对不起-我的预期响应似乎没有被剪切并粘贴在上面:
value2 value3 count
1 orange 2
all other orange 2
答案 0 :(得分:0)
看看是否可行,但是我会看看是否有人可以给你一个简单而简短的版本,
df1 = df[['a','value3']].drop_duplicates()
##Merging the dataframes
merge =df1.merge(df,how = 'left',left_index=True, right_index=True)
##Selecting the only requried columns
merge = merge[['value2','value3_x']]
##Renaming the columns
merge = merge.rename(columns={'value3_x':'value3'})
##Filtering the data
merge = merge[merge.value3=='Orange']
##Converting te value to string
merge['value2']= df.value2.astype(str)
## Changing the value of value2
merge['value2'] = merge.value2.apply(lambda x: '1' if x == '1' else 'all other')
##Grouping the data
merge.groupby(['value2','value3']).value3.count()