Question

我目前正在研究控制连接到Windows PC上的Teensy 3.2板的LED灯条的项目。从技术上讲，它基于此项目： https://www.pjrc.com/teensy/td_libs_OctoWS2811.html

在vvvv中还实现了一个项目： https://vvvv.org/contribution/realtime-led-control-with-teensy3.xoctows2811

到目前为止，两者都工作正常。我想做的是将movie2serial程序（关于pjrc.com上的项目）移植到Python。

所以我找到了这个项目： https://github.com/agwn/movie2serial_py

它不是开箱即用的，但是我做了一些修改就可以运行它了。这是我的接收图像的类的代码，将其转换为字节数组并将其发送到串行端口：

import serial
import numpy as np

class Teensy:
  def __init__(self, port='COM3', baudrate=115200, stripes=4, leds=180):
    self.stripes = stripes
    self.leds = leds
    self.connected = True
    try:
      self.port = serial.Serial(port, baudrate)
    except:
      self.connected = False

  def close(self):
    if not self.connected:
      return
    self.black_out()
    self.port.close()

  def send(self, image):
    data = list(self.image2data(image))
    data.insert(0, 0x00)
    data.insert(0, 0x00)
    data.insert(0, ord('*'))
    if not self.connected:
      return
    self.port.write(''.join(chr(b) for b in data).encode())

  def black_out(self):
    self.send(np.zeros((self.leds,self.stripes,3), np.uint8))

  def image2data(self, image):
    buffer = np.zeros((8*self.leds*3), np.uint8)
    byte_count = 0
    order = [1,2,0]
    for led in range(self.leds):
      for channel in range(3):
        for bit in range(8):
          bits_out = 0
          for pin in range(self.stripes):
            if 0x80 >> bit & image[led,pin,order[channel]]:
              bits_out |= 1 << pin
          buffer[byte_count] = bits_out
          byte_count += 1
    return buffer

它可以正常工作，但是速度很慢（我的计算机上约为13 FPS）。

解释代码：我正在用cv2创建一个简单的动画，并将图像（具有4 x 180像素的numpy ndarray，因为我有4个LED条纹，每个带180个LED）发送到Teensy实例的send方法。 send方法将图像发送到image2data方法，以将图像转换为字节数组，在开头放置几个字节，然后将整个内容发送到Teensy。

此代码中有两个瓶颈：

写入串行端口（方法send中的self.port.write）。也许它无法加快速度，这是可以接受的。

但更重要的是：

访问图像数组（方法image2data中的image [led，pin，order [channel]]）。当我将行更改为例如：

如果0x80 >>位和255：

代码运行速度提高了6到7倍（〜80 FPS）。顺便说一下，order [channel]用于将颜色从BGR转换为GRB。

长话短说：从图像阵列读取颜色非常慢。如何在image2data方法中加速图像数组到字节数组的转换？

到此为止，谢谢您的耐心:-)很抱歉，我的帖子很长，但这是一个复杂的项目，对我来说不容易解释。非常感谢您的帮助，也许其他人可以从中受益。

预先感谢，铝

Answer 1

可以通过将<moqui.security.ArtifactGroupMember artifactGroupId="EXAMPLE_APP" artifactName="mantle\..*" nameIsPattern="Y" artifactTypeEnumId="AT_ENTITY" inheritAuthz="Y"/>提升到该内部循环之外（通过将order[channel]保存在循环中的channel_index = order[channel]上），然后写入，从而稍稍改善第二个热点>

order

这将是一个很小的改进。看起来，提升if 0x80 >> bit & image[led,pin,channel_index]:可以节省多余的8次计算。将其另存为0x80 >> bit，您将拥有

mask

这些可能合起来值得一些FPS。

但是看看您的代码，那些循环嵌套的方式看起来不对劲。对于180 x 4 RGB LED，我希望您需要将180 x 4 x 3字节发送给Teensy。但是代码发送的是3 x 180 x8。是否有可能需要反转两个内部循环？

Answer 2

感谢您的回答和改进。我将在稍后实现它们，但是我想它们不会将帧速率提高到所需的60 FPS。

由于使用了Teensy板，代码发送了3 x 180 x 8。 LED通过以太网电缆连接到板上，以太网电缆有8个引脚，所有8个引脚都需要寻址，否则条纹显示出奇怪的结果。另一方面，在以后的配置中，我需要4个以上的条带，因此目前我不希望将数据发送到8条而不是4条。而且我认为代码运行速度不会很快。

当我在开头的帖子中提到时，这段代码似乎很慢，而且我不明白为什么：图片[led，pin，order [channel]]

以下是Processing草图中的代码，其运行速度比Python脚本快至少10倍：

void image2data(PImage image, byte[] data, boolean layout) {
  int offset = 3;
  int x, y, xbegin, xend, xinc, mask;
  int linesPerPin = image.height / 8;
  int pixel[] = new int[8];
  for (y = 0; y < linesPerPin; y++) {
    if ((y & 1) == (layout ? 0 : 1)) {
      xbegin = 0;
      xend = image.width;
      xinc = 1;
    } else {
      xbegin = image.width - 1;
      xend = -1;
      xinc = -1;
    }
    for (x = xbegin; x != xend; x += xinc) {
      for (int i=0; i < 8; i++) {
        pixel[i] = image.pixels[x + (y + linesPerPin * i) * image.width];
        pixel[i] = colorWiring(pixel[i]);
      }
      for (mask = 0x800000; mask != 0; mask >>= 1) {
        byte b = 0;
        for (int i=0; i < 8; i++) {
          if ((pixel[i] & mask) != 0) b |= (1 << i);
        }
        data[offset++] = b;
      }
    }
  } 
}

我不敢相信Python比Java慢得多。我仍然希望有人知道访问numpy数组的像素有什么问题。

带Teensy和Python的LED条纹

2 个答案: