我有dataframe
,我希望按分类变量和一系列值进行分组。您可能会将其视为类似值的行(群集?)。 E.g:
df = pd.DataFrame({'symbol' : ['IP', 'IP', 'IP', 'IP', 'IP', 'IP', 'IP'],
'serie' : ['A', 'B', 'A', 'B', 'A', 'B', 'B'],
'strike' : [10, 10, 12, 13, 12, 13, 14],
'last' : [1, 2, 2.5, 3, 4.5, 5, 6],
'price' : [11, 11, 11, 11, 11, 11, 11],
'type' : ['call', 'put', 'put', 'put', 'call', 'put', 'call']})
如果我使用
grouped = df.groupby(['symbol', 'serie', 'strike'])
我已经解决了部分问题,但我希望将更接近的打击值组合起来,例如10和11,12和13等等。优选在%范围内。
答案 0 :(得分:2)
body {
padding: 0; /* don't let the browser try to be fancy */
margin: 0; /* don't let the browser try to be fancy */
box-sizing: border-box; /* div width, height, border and padding included. margin not */
word-wrap: break-word; /* break words to avoid text going outside of div */
}
/* Default behaviour is mobile
@media (min-width: 63em) {
.en {
width: calc(48% - 15px);
margin-bottom: 0px;
margin-right: calc(15px + 2%);
}
.pt {
width: calc(48% - 15px);
margin-left: calc(15px + 2%);
}
}
*/
/* the viewport property is here to make sure the device (mobile) won't zoom out too present the desktop version */
/* http://webdesign.tutsplus.com/articles/quick-tip-dont-forget-the-viewport-meta-tag--webdesign-5972 */
@viewport{
zoom: 1.0;
width: extend-to-zoom; /* this is to ensure it shows correctly in both landscape and portrait mode */
}
/* Obsessive compulsive behaviour: Nobody touch the borders! */
.page {
line-height: 1.3rem;
margin-top: calc(1.5rem + 1%);
margin-bottom: calc(1.5rem + 1%);
margin-left: calc(1.5rem + 2%);
margin-right: calc(1.5rem + 2%);
}
/* MAIN STRUCTURE */
/* This is an ID because we only use it once and specifically*/
#header {
width: 100%;
display: inline-block;
margin-bottom: 1.3rem;
}
#navigation {
width: 100%;
display: inline-block;
}
/* inside the header */
#title {
float: left;
font-size: 1.3rem;
}
#meta {
float: right;
}
.entries {
float: left;
display: inline-block;
width: 100%;
}
/* ENTRIES */
.entry{
float: left;
margin-bottom: 1.3rem;
}
.en {
float: left;
width: 100%;
margin-bottom: 10px;
}
.pt {
float: left;
width: 100%;
font-style: italic;
}
.entry_title{
font-weight: bold;
}
.entry_body{
}
.entry_category{
}
.entry_footer{
color: grey;
}
groupy()
使用pd.cut
创建点击数据的分类,然后按该信息分组:
strike
# Create DataFrame
df = pd.DataFrame({
'symbol' : ['IP', 'IP', 'IP', 'IP', 'IP', 'IP', 'IP'],
'serie' : ['A', 'B', 'A', 'B', 'A', 'B', 'B'],
'strike' : [10, 10, 12, 13, 12, 13, 14],
'last' : [1, 2, 2.5, 3, 4.5, 5, 6],
'price' : [11, 11, 11, 11, 11, 11, 11],
'type' : ['call', 'put', 'put', 'put', 'call', 'put', 'call']
})
# Create Bins (example three bins across data)
df['strikebins'] = pd.cut(df['strike'], bins=3)
print 'Binned DataFrame:'
print df
print
# Group these DataFrame
grouped = df.groupby(['symbol', 'serie', 'strikebins'])
# Do something with groups for example
gp_sum = grouped.sum()
print 'Grouped Sum (for example):'
print gp_sum
print
如果您愿意,可以Binned DataFrame:
last price serie strike symbol type strikebins
0 1.0 11 A 10 IP call (9.996, 11.333]
1 2.0 11 B 10 IP put (9.996, 11.333]
2 2.5 11 A 12 IP put (11.333, 12.667]
3 3.0 11 B 13 IP put (12.667, 14]
4 4.5 11 A 12 IP call (11.333, 12.667]
5 5.0 11 B 13 IP put (12.667, 14]
6 6.0 11 B 14 IP call (12.667, 14]
Grouped Sum (for example):
last price strike
symbol serie strikebins
IP A (9.996, 11.333] 1 11 10
(11.333, 12.667] 7 22 24
(12.667, 14] NaN NaN NaN
B (9.996, 11.333] 2 11 10
(11.333, 12.667] NaN NaN NaN
(12.667, 14] 14 33 40
drop()
,或者使用范围的平均值替换strike
...
答案 1 :(得分:1)
我猜OP想要按分类变量进行分组,然后按间隔分组数字变量。在这种情况下,您可以使用np.digitize()
。
smallest = np.min(df['strike'])
largest = np.max(df['strike'])
num_edges = 3
# np.digitize(input_array, bin_edges)
ind = np.digitize(df['strike'], np.linspace(smallest, largest, num_edges))
然后ind
应
array([1, 1, 2, 2, 2, 2, 3], dtype=int64)
对应于分箱
[10, 10, 12, 13, 12, 13, 14]
带有bin边缘的
array([ 10., 12., 14.]) # == np.linspace(smallest, largest, num_edges)
最后,按所需的所有列进行分组,但使用此附加bin列
df['binned_strike'] = ind
for grp in df.groupby(['symbol', 'serie', 'binned_strike']):
print "group key"
print grp[0]
print "group content"
print grp[1]
print "============="
这应该打印
group key
('IP', 'A', 1)
group content
last price serie strike symbol type binned_strike
0 1.0 11 A 10 IP call 1
=============
group key
('IP', 'A', 2)
group content
last price serie strike symbol type binned_strike
2 2.5 11 A 12 IP put 2
4 4.5 11 A 12 IP call 2
=============
group key
('IP', 'B', 1)
group content
last price serie strike symbol type binned_strike
1 2.0 11 B 10 IP put 1
=============
group key
('IP', 'B', 2)
group content
last price serie strike symbol type binned_strike
3 3.0 11 B 13 IP put 2
5 5.0 11 B 13 IP put 2
=============
group key
('IP', 'B', 3)
group content
last price serie strike symbol type binned_strike
6 6.0 11 B 14 IP call 3
=============