Question

我试图在Python中重现我在一本书中找到的两个例子（最初是用Java编写的）。

这两个函数检查字符串是否包含重复的字符。第一个函数使用整数（checker）作为位向量，而第二个函数只使用一个布尔列表。我期望使用带有位的函数有更好的性能，但实际上它表现更差。

为什么？在翻译＆＃34;翻译时，我写错了什么？从Java到Python？

注意：为简单起见，我们只使用小写字母（ a 到 z ），特别是对于位向量函数。

import sys
import timeit

def is_unique_chars_bit(my_str):
    checker = 0
    for char in my_str:
        val = ord(char) - ord('a')
        if ((checker & (1 << val)) > 0):
            return False
        checker |= (1 << val)
    return True

def is_unique_chars_list(my_str):
    if len(my_str) > 128:
        # Supposing we use ASCII, which only has 128 chars
        return False
    char_list = [False] * 128
    for char in my_str:
        val = ord(char)
        if char_list[val]:
            return False
        char_list[val] = True
    return True

if __name__ == '__main__':
    alphabet = "abcdefghijklmnopqrstuvwxyz"
    t_bit = timeit.Timer("is_unique_chars_bit('"+ alphabet +"')", "from __main__ import is_unique_chars_bit")
    t_list = timeit.Timer("is_unique_chars_list('"+ alphabet +"')", "from __main__ import is_unique_chars_list")
    print(t_bit.repeat(3, 200000))
    print(t_list.repeat(3, 200000))

结果：

[1.732477278999795, 1.7263494359995093, 1.7404333820004467]
[0.6785205180003686, 0.6759967380003218, 0.675434408000001]

原始Java函数如下：

boolean isUniqueCharsBoolArray(String str) {
    if (str.length() > 128) return false;

    boolean[] char_set = new boolean[128];
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i);
        if (char_set[val]) {
            return false;
        }
        char_set[val] = true;
    }
    return true;
}

boolean isUniqueCharsBits(String str) {
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i) -'a';
        if ((checker & (1 << val)) > 0) {
            return false;
        }
        checker |= (1 << val);
    }
    return true;
}

Answer 1

这是因为整数是python中的不可变引用类。这意味着整数操作通常很慢。（即使对于python2整数也是如此）请看以下行：

<bean id="jmsConnectionFactory"
          class="org.apache.activemq.ActiveMQConnectionFactory">
        <property name="brokerURL" value="tcp://localhost:61616"/>
    </bean>

    <bean id="pooledConnectionFactory"
          class="org.apache.activemq.pool.PooledConnectionFactory" init-method="start" destroy-method="stop">
        <property name="maxConnections" value="8"/>
        <property name="connectionFactory" ref="jmsConnectionFactory"/>
    </bean>

    <bean id="jmsConfig"
          class="org.apache.camel.component.jms.JmsConfiguration">
        <property name="connectionFactory" ref="pooledConnectionFactory"/>
        <property name="concurrentConsumers" value="10"/>
    </bean>
<camelContext xmlns="http://camel.apache.org/schema/spring">
    <bean id="activemq"
          class="org.apache.activemq.camel.component.ActiveMQComponent">
        <property name="configuration" ref="jmsConfig"/>
    </bean>

<route id="*****">
            <from uri="+++++++++" />
            <choice>
                    <to uri="activemq:queue:**********"/>
                            </choice>
        </route>
    </camelContext>

如果我们扩展作业，我们会得到：

checker |= (1 << val)

此单行在内存中分配两个新整数。一个用于checker = checker | (1 << val)，一个用于按位或。

另一方面，分配数组元素不需要分配对象，这就是它更快的原因。

如果您正在寻找确定字符串是否具有重复字符的最快方法，则此函数更短且更快比前两个字符更短（取自"check duplicates in list"）：

1 << val

Timeit显示3倍加速（最后一行是新的加速）：

def is_unique_chars_set(my_str):
    return len(my_str) != len(set(my_str))

注意：如果您使用其他python运行时，结果可能会有很大差异，例如IronPython

位向量与布尔值性能列表

1 个答案: