有人可以在下面的多索引数据框中获取特定数据点时给我一个快速/清晰的课程吗?我整天都在看教程,但是都没有什么帮助。对于认识熊猫的人来说,这应该很简单。
如何执行以下操作:
在数据框的最后日期提取“ AAPL”的“关闭”
如果特定日期的“关闭”>“ AAPL”的“开放”,则提取“ AAPL”的所有数据并添加到新的数据框中
为每个符号(AAPL,FB)添加一个新列,其标签为“范围”,并且每天为“高”-“低”
。
symbol AAPL FB
ohlcv open high low close adj volume open high low close adj volume
Date
2018-09-17 222.15 222.95 217.27 217.88 217.88 37195100 161.92 162.06 159.77 160.58 160.58 21005300
2018-09-18 217.79 221.85 217.12 218.24 218.24 31571700 159.39 161.76 158.87 160.30 160.30 22465200
2018-09-19 218.50 219.62 215.30 218.37 218.37 27123800 160.08 163.44 159.48 163.06 163.06 19629000
2018-09-20 220.24 222.28 219.15 220.03 220.03 26460800 164.50 166.45 164.47 166.02 166.02 18824200
2018-09-21 220.78 221.36 217.29 217.66 217.66 96246748 166.64 167.25 162.81 162.93 162.93 25956794
此处是数据框的字典,是以下要求的注释之一,
df = pd.DataFrame({('AAPL', 'adj_close'): {
pd.Timestamp('2018-01-02 00:00:00'): 170.3,
pd.Timestamp('2018-01-03 00:00:00'): 170.27,
pd.Timestamp('2018-01-04 00:00:00'): 171.07,
pd.Timestamp('2018-01-05 00:00:00'): 173.01,
pd.Timestamp('2018-01-08 00:00:00'): 172.37},
('AAPL', 'close'): {
pd.Timestamp('2018-01-02 00:00:00'): 172.26,
pd.Timestamp('2018-01-03 00:00:00'): 172.23,
pd.Timestamp('2018-01-04 00:00:00'): 173.03,
pd.Timestamp('2018-01-05 00:00:00'): 175.0,
pd.Timestamp('2018-01-08 00:00:00'): 174.35},
('AAPL', 'high'): {
pd.Timestamp('2018-01-02 00:00:00'): 172.3,
pd.Timestamp('2018-01-03 00:00:00'): 174.55,
pd.Timestamp('2018-01-04 00:00:00'): 173.47,
pd.Timestamp('2018-01-05 00:00:00'): 175.37,
pd.Timestamp('2018-01-08 00:00:00'): 175.61},
('AAPL', 'low'): {
pd.Timestamp('2018-01-02 00:00:00'): 169.26,
pd.Timestamp('2018-01-03 00:00:00'): 171.96,
pd.Timestamp('2018-01-04 00:00:00'): 172.08,
pd.Timestamp('2018-01-05 00:00:00'): 173.05,
pd.Timestamp('2018-01-08 00:00:00'): 173.93},
('AAPL', 'open'): {
pd.Timestamp('2018-01-02 00:00:00'): 170.16,
pd.Timestamp('2018-01-03 00:00:00'): 172.53,
pd.Timestamp('2018-01-04 00:00:00'): 172.54,
pd.Timestamp('2018-01-05 00:00:00'): 173.44,
pd.Timestamp('2018-01-08 00:00:00'): 174.35},
('AAPL', 'volume'): {
pd.Timestamp('2018-01-02 00:00:00'): 25555900,
pd.Timestamp('2018-01-03 00:00:00'): 29517900,
pd.Timestamp('2018-01-04 00:00:00'): 22434600,
pd.Timestamp('2018-01-05 00:00:00'): 23660000,
pd.Timestamp('2018-01-08 00:00:00'): 20567800},
('FB', 'adj_close'): {
pd.Timestamp('2018-01-02 00:00:00'): 181.42,
pd.Timestamp('2018-01-03 00:00:00'): 184.67,
pd.Timestamp('2018-01-04 00:00:00'): 184.33,
pd.Timestamp('2018-01-05 00:00:00'): 186.85,
pd.Timestamp('2018-01-08 00:00:00'): 188.28},
('FB', 'close'): {
pd.Timestamp('2018-01-02 00:00:00'): 181.42,
pd.Timestamp('2018-01-03 00:00:00'): 184.67,
pd.Timestamp('2018-01-04 00:00:00'): 184.33,
pd.Timestamp('2018-01-05 00:00:00'): 186.85,
pd.Timestamp('2018-01-08 00:00:00'): 188.28},
('FB', 'high'): {
pd.Timestamp('2018-01-02 00:00:00'): 181.58,
pd.Timestamp('2018-01-03 00:00:00'): 184.78,
pd.Timestamp('2018-01-04 00:00:00'): 186.21,
pd.Timestamp('2018-01-05 00:00:00'): 186.9,
pd.Timestamp('2018-01-08 00:00:00'): 188.9},
('FB', 'low'): {
pd.Timestamp('2018-01-02 00:00:00'): 177.55,
pd.Timestamp('2018-01-03 00:00:00'): 181.33,
pd.Timestamp('2018-01-04 00:00:00'): 184.1,
pd.Timestamp('2018-01-05 00:00:00'): 184.93,
pd.Timestamp('2018-01-08 00:00:00'): 186.33},
('FB', 'open'): {
pd.Timestamp('2018-01-02 00:00:00'): 177.68,
pd.Timestamp('2018-01-03 00:00:00'): 181.88,
pd.Timestamp('2018-01-04 00:00:00'): 184.9,
pd.Timestamp('2018-01-05 00:00:00'): 185.59,
pd.Timestamp('2018-01-08 00:00:00'): 187.2},
('FB', 'volume'): {
pd.Timestamp('2018-01-02 00:00:00'): 18151900,
pd.Timestamp('2018-01-03 00:00:00'): 16886600,
pd.Timestamp('2018-01-04 00:00:00'): 13880900,
pd.Timestamp('2018-01-05 00:00:00'): 13574500,
pd.Timestamp('2018-01-08 00:00:00'): 17994700}})
答案 0 :(得分:0)
您可以通过建立索引直接从多索引访问列。由于您尚未发布数据框代码,因此可以使用以下代码片段尝试它们是否起作用:
import pygame
window = pygame.display.set_mode((1000,1000))
BGImage = pygame.image.load('Plat.jpg')
window.blit(BGImage(0,0))
Eggshell = (240,235,220)
vel = 15
x = 3
y = 450
width = 50
height= 60
isJump = False
jumpCount = 10
run = True
while run:
pygame.time.delay(100)
for event in pygame.event.get():
if event.type == pygame.QUIT:
run == False
pressed = pygame.key.get_pressed()
if pressed[pygame.K_LEFT] and x > vel:
x-= vel
if pressed[pygame.K_RIGHT] and x < 920 :
x+=vel
if not (isJump):
if pressed[pygame.K_UP] and y > vel:
isJump = True
else:
if jumpCount >= -10:
neg = 1
if jumpCount < 0:
neg = -1
y -= (jumpCount ** 2) * 0.5 * neg
jumpCount -= 1
else:
isJump = False
jumpCount = 10
window.fill((0,0,0))
pygame.draw.rect(window,Eggshell,(x,y,width,height))
pygame.display.update()
pygame.quit()
将为您提供“ AAPL”的“关闭”列。您可以按日期对该列进行排序以提取上一个日期。
df[('AAPL', 'close')]
要比较和提取所有“ AAPL”数据,您可以执行以下操作:
df.sort_values('Date', ascending=False).head(1)[('AAPL', 'close')]
在过滤条件中也添加日期。
可能有一种更好的方法,但这可能仍然有效:
df[df[('AAPL', 'close')] > df[('AAPL', 'open')]]['AAPL']
您可以像在正常数据框中一样添加日期条件。
答案 1 :(得分:0)
IIUC,
只需执行df.index.max()
并选择AAPL /关闭,即可获得最长日期
df.loc[df.index.max(), ('AAPL', 'close')]
基本上,如果您使用mask
进行过滤,则会返回data frame
。因此,无需“附加到其他数据框”。
mask = df.loc[:, ('AAPL', 'open')] > df.loc[:, ('AAPL', 'close')]
df.loc[mask[mask].index, ('AAPL')]
您只需选择列(ticker, info)
,其中ticker
将是AAPL, FB, ...
,而info
将是high, close, ...
,然后加入即可。
r = df.loc[:, [('AAPL', 'high'), ('FB', 'high')]].sub(df.loc[:, [('AAPL', 'low'), ('FB', 'low')]].values).rename(columns={"high": "range"})
df = df.join(r).sort_index(1)
请注意,您正在使用MultiIndex
列。这使得所有操作都更难以编写代码。您可能会考虑使用名为ticker' and values as
AAPL , FB
等的新列更改为单索引列。
例如,使用stack
+ reset_index
,您将获得
df2 = df.stack(level=0).reset_index().rename(columns={'level_0': 'date', 'level_1': 'ticker'}).sort_values('ticker')
date ticker adj_close close high low open range volume
0 2018-01-02 AAPL 170.30 172.26 172.30 169.26 170.16 3.04 25555900
2 2018-01-03 AAPL 170.27 172.23 174.55 171.96 172.53 2.59 29517900
4 2018-01-04 AAPL 171.07 173.03 173.47 172.08 172.54 1.39 22434600
6 2018-01-05 AAPL 173.01 175.00 175.37 173.05 173.44 2.32 23660000
8 2018-01-08 AAPL 172.37 174.35 175.61 173.93 174.35 1.68 20567800
1 2018-01-02 FB 181.42 181.42 181.58 177.55 177.68 4.03 18151900
3 2018-01-03 FB 184.67 184.67 184.78 181.33 181.88 3.45 16886600
5 2018-01-04 FB 184.33 184.33 186.21 184.10 184.90 2.11 13880900
7 2018-01-05 FB 186.85 186.85 186.90 184.93 185.59 1.97 13574500
9 2018-01-08 FB 188.28 188.28 188.90 186.33 187.20 2.57 17994700
然后,例如,计算range
,它要简单得多:
df2['range2'] = df2['high'] - df2['low']