Question

[简介] 我有一个定制的Python游戏，使用＆＃39; w＆＃39; ＆＃39; S＆＃39;移动和空间的关键＆＃39;拍摄作为输入的关键。我找到了一个强化学习算法，我想尝试在游戏中实现。

然而，RL算法使用openAI的atari游戏作为具有命令＆amp; gym.make（env_name）＆＃39;的环境。我在Windows操作系统上，所以不能在健身房[atari]不适合我的代码上进行实验。

class Agent:
    def __init__(self, env_name, training, render=False, use_logging=True):

        self.env = gym.make(env_name)

[问题] 是否有另一个命令我可以使用而不是＆＃39; gym.make（）＆＃39;在这个类中实现RL算法来训练我定制的游戏，还是制作我自己的健身房环境的唯一选择？将＆＃39; pygame.surfarray.array2d（）＆＃39;返回类似于＆＃39; gym.make（）＆＃39;？

请告诉我是否需要更多信息，我是健身房和张量流的新手，所以我的理解可能有缺陷。

[编辑] 我使用函数制作游戏，如果我要将游戏转换为健身房环境，那么唯一的选择是将函数转换为类吗？作为我的代码看起来如何的一个例子，这里是游戏循环:(我不能发布整个代码，因为它是年终等级的受控评估，所以我希望避免任何抄袭问题）< / p>

def game_loop():
global pause
x = (display_width * 0.08)
y = (display_height * 0.2)

x_change = 0
y_change = 0

blob_speed = 2

velocity = [2, 2]

score = 0
lives = 3

pos_x = display_width/1.2
pos_y = display_height/1.2

previous_time = pygame.time.get_ticks()
previous_time2 = pygame.time.get_ticks()

gameExit = False
while not gameExit:
    for event in pygame.event.get():#monitors hardware movement/ clicks
        if event.type == pygame.QUIT:
            pygame.quit()
            quit()

    pos_x += velocity[0]
    pos_y += velocity[1]

    if pos_x + blob_width > display_width or pos_x < 601:
        velocity[0] = -velocity[0]

    if pos_y + blob_height > display_height or pos_y < 0:
        velocity[1] = -velocity[1]

    for b in range(len(bullets2)):
        bullets2[b][0] -= 6

    for bullet in bullets2:
        if bullet[0] < 0:
            bullets2.remove(bullet)


    current_time2 = pygame.time.get_ticks()
    #ready to fire when 500 ms have passed.
    if current_time2 - previous_time2 > 500:
        previous_time2 = current_time2
        bullets2.append([pos_x+25, pos_y+24])

    keys = pygame.key.get_pressed()

    for b in range(len(bullets)):
        bullets[b][0] += 6

    for bullet in bullets:
        if bullet[0] > 1005:
            bullets.remove(bullet)

    if keys[pygame.K_SPACE]:
        current_time = pygame.time.get_ticks()
        #ready to fire when 500 ms have passed.
        if current_time - previous_time > 600:
            previous_time = current_time
            bullets.append([x+25, y+24])


    if x < 0:
        x = 0
    if keys[pygame.K_a]:
        x_change = -blob_speed
    if x > 401 - blob_width:
        x = 401 - blob_width
    if keys[pygame.K_d]:
        x_change = blob_speed
    if keys[pygame.K_p]:
        pause = True
        paused()


    if keys[pygame.K_a] and keys[pygame.K_d]:
        x_change = 0
    if not keys[pygame.K_a] and not keys[pygame.K_d]:
        x_change = 0

    if y < 0:
        y = 0
    if keys[pygame.K_w]:
        y_change = -blob_speed
    if y > display_height - blob_height:
        y = display_height - blob_height
    if keys[pygame.K_s]:
        y_change = blob_speed


    if keys[pygame.K_w] and keys[pygame.K_s]:
        y_change = 0
    if not keys[pygame.K_w] and not keys[pygame.K_s]:
        y_change = 0


    #print(event)
    # Reset x and y to new position
    x += x_change
    y += y_change

    gameDisplay.fill(blue)  #changes background surface
    bullets_hit(score)
    player_lives(lives)
    pygame.draw.line(gameDisplay, black, (601, display_height), (601, 0), 3)
    pygame.draw.line(gameDisplay, black, (401, display_height), (401, 0), 3)
    blob(pos_x, pos_y)
    blob(x, y)

    for bullet in bullets:
        gameDisplay.blit(bulletpicture, pygame.Rect(bullet[0], bullet[1], 0, 0))
        if bullet[0] > pos_x and bullet[0] < pos_x + blob_width:
            if bullet[1] > pos_y and bullet[1] < pos_y + blob_height or bullet[1] + bullet_height > pos_y and bullet[1] + bullet_height < pos_y + blob_height:
                bullets.remove(bullet)
                score+=1

    for bullet in bullets2:
        gameDisplay.blit(bulletpicture, pygame.Rect(bullet[0], bullet[1], 0, 0))
        if bullet[0] + bullet_width < x + blob_width and bullet[0] > x:
            if bullet[1] > y and bullet[1] < y + blob_height or bullet[1] + bullet_height > y and bullet[1] + bullet_height < y + blob_height:
                bullets2.remove(bullet)
                lives-=1

    if lives == 0:
        game_over()


    pygame.display.update() #update screen
    clock.tick(120)#moves frame on (fps in parameters)

Answer 1

最好的选择就是简单地实现自己的自定义环境。您可以在gym repository on github中找到有关实施自定义环境的一些说明。

如果您还打算与他人分享您的环境，那么其中一些说明可能只是相关的，而不是如果您只是想自己使用它。我怀疑对你来说最重要的部分（假设你只是想自己而不是上传其他人可以使用的包）（从上面的链接复制）：

gym-foo/gym_foo/envs/foo_env.py应该类似于：

import gym
from gym import error, spaces, utils
from gym.utils import seeding

class FooEnv(gym.Env):
  metadata = {'render.modes': ['human']}

  def __init__(self):
    ...
  def step(self, action):
    ...
  def reset(self):
    ...
  def render(self, mode='human', close=False):
    ...

gym-foo/gym_foo/__init__.py应该：

from gym.envs.registration import register

register(
    id='foo-v0',
    entry_point='gym_foo.envs:FooEnv',
)
register(
    id='foo-extrahard-v0',
    entry_point='gym_foo.envs:FooExtraHardEnv',
)

gym-foo/gym_foo/envs/__init__.py应该：

from gym_foo.envs.foo_env import FooEnv
from gym_foo.envs.foo_extrahard_env import FooExtraHardEnv

第一个块是环境本身的实现。如果你已经实现了游戏，你希望不必在那里实现很多。 gym.Env的这个子类应该只是一个＆＃34;包装器＆＃34;围绕已经存在的游戏，在期望gym API（step()，reset()等）和游戏本身的RL代理之间形成桥梁。您可以从atari_env中的gym实现中获取灵感，gym.make()实际上它只是已经存在的Atari游戏的包装器，并不直接包含这些游戏的完整游戏逻辑。

需要第二个和第三个块以确保您可以使用gym.Env函数开始创建自定义环境的实例。

您确实必须创建一个以step类作为基类的类，并确保实现其所有重要功能（如reset和gym）。也就是说，假设您要使用已经实现的RL算法并期望这些函数存在。当然，另一种选择是将{{1}}完全抛出窗口并从头开始实现所有内容，但是你很可能最终只是做了更多工作并最终得到了类似的API。

OpenAI / Tensorflow自定义游戏环境而不是使用＆＃39; gym.make（）＆＃39;

1 个答案: