在字符串上调用str()的成本?

时间:2017-06-08 15:02:30

标签: python string

在已经是字符串的对象上调用str函数的成本(如果有的话)是多少?这里的用例是规范化不同类型的对象数组并将它们转换为字符串,天真地可以这样实现:

def arr_2_strarr(arr):
    return [str(val) for val in arr]

但如果str()导致过多开销,而我的arr主要包含字符串,我可以考虑使用:

def arr_2_strarr2(arr):
    return [str(val) if not isinstance(val, basestring) else val for val in arr]

有什么建议吗?

2 个答案:

答案 0 :(得分:16)

在字符串对象上调用str非常便宜:它只返回原始字符串对象。明确地调用isinstance肯定会慢一些。

如果要对实际数据进行测试,请查看timeit模块。

顺便说一句,你应该从你的第二个版本中删除not

[val if isinstance(val, basestring) else str(val) for val in arr]

您可以通过缓存str

来加快速度
def arr_2_strarr(arr, str=str):
    return [str(val) for val in arr]

快乐的微优化。 :)

为什么要缓存str?好吧,每次使用名称时,Python都必须查找它。如果您在函数内部,首先它会查找本地命名空间,如果它找不到该名称,那么它将查找全局变量。即使str是内置的,它仍然会生活在#34;在全局命名空间中;导入"导入"效率低下所有内置函数都包含在每个函数中。通过做

def arr_2_strarr(arr, str=str)

我们创建了一个绑定到内置str类型的本地名称str,因为它是搜索&的默认参数。绑定过程在执行函数定义时发生一次,而不是每次调用函数时都发生。

因此,每当我们调用arr_2_strarr时,解释器将立即找到本地str,这将节省很少的时间。

这里有一些timeit代码,用于比较各种策略。它运行在Python 2& Python 3,虽然在Python 3上它将str替换为basestr,因为{3}在Python 3中不存在。

此代码首先使用整数数据运行各种大小的列表上的函数,然后使用通过将整数数据转换为字符串而创建的字符串数据。

每行输出给出了在3次重复中执行给定循环次数的时间,从最快到最慢排序。正如timeit repeat docs所述,每次运行中要看的主要数字是最小的数字。

给定列表大小和类型的所有函数的结果也从最快到最慢排序。

basestr

典型的Python 2输出

''' Compare the speeds of direct string conversion
    with testing first via isinstance

    See https://stackoverflow.com/q/44439323/4014959

    Written by PM 2Ring 2017.06.09

    Python 2 / 3 compatible
'''

from __future__ import print_function, division
from timeit import Timer
import sys

# Python 3 doesn't have basestring
if sys.version_info[0] > 2:
    basestring = str

# The functions to test
def plain(arr):
    return [str(val) for val in arr]

def cached(arr, str=str):
    return [str(val) for val in arr]

def teststr(arr):
    return [val if isinstance(val, str) else str(val) for val in arr]

def testbase(arr):
    return [val if isinstance(val, basestring) else str(val) for val in arr]

def testbasenot(arr):
    return [str(val) if not isinstance(val, basestring) else val for val in arr]

funcs = (
    plain,
    cached,
    teststr,
    testbase,
    testbasenot,
)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def verify(arr):
    results = [func(arr) for func in funcs]
    first, results = results[0], results[1:]
    return all(first == u for u in results)

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import arr, ' + fname
        cmd = fname + '(arr)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:12} {1}'.format(fname, result))

# Check that all functions return the same results
if 0:
    print('Testing all functions')
    arr = list(range(10))
    print(arr, verify(arr))
    arr = list('abcdefghij')
    print(arr, verify(arr))

# Do the timing tests
reps = 3
loops = 1 << 16
for i in range(1, 11):
    n = 1 << i
    # Build a data array of integers
    arr = range(n)
    print('\n{0}: Size={1}, Loops={2}'.format(i, n, loops))
    print('* Integer')
    time_test(loops, reps)

    # Convert the data array contents to strings
    arr = cached(arr)
    print('\n* String')
    time_test(loops, reps)
    loops >>= 1    

典型的python3输出

1: Size=2, Loops=65536
* Integer
cached       [0.17268610000610352, 0.19634914398193359, 0.2058720588684082]
plain        [0.17906594276428223, 0.18797492980957031, 0.24009895324707031]
teststr      [0.32513308525085449, 0.33270597457885742, 0.35080599784851074]
testbasenot  [0.32793092727661133, 0.33176803588867188, 0.33498501777648926]
testbase     [0.32964491844177246, 0.33154511451721191, 0.33760714530944824]

* String
cached       [0.1619560718536377, 0.1628870964050293, 0.16448402404785156]
teststr      [0.16335082054138184, 0.16484308242797852, 0.17012500762939453]
plain        [0.16956901550292969, 0.1711430549621582, 0.18457293510437012]
testbase     [0.22378706932067871, 0.2255101203918457, 0.22593879699707031]
testbasenot  [0.22855901718139648, 0.22941207885742188, 0.23271608352661133]

2: Size=4, Loops=32768
* Integer
cached       [0.12796807289123535, 0.12807202339172363, 0.12817001342773438]
plain        [0.13622713088989258, 0.14297294616699219, 0.14868402481079102]
teststr      [0.27701020240783691, 0.27812099456787109, 0.2795259952545166]
testbasenot  [0.27815794944763184, 0.28220701217651367, 0.29373884201049805]
testbase     [0.2804868221282959, 0.28186416625976562, 0.31699705123901367]

* String
cached       [0.12131500244140625, 0.12241697311401367, 0.13379192352294922]
teststr      [0.12839889526367188, 0.1314079761505127, 0.14053797721862793]
plain        [0.13051795959472656, 0.14696002006530762, 0.18504786491394043]
testbase     [0.18404412269592285, 0.1844489574432373, 0.19633579254150391]
testbasenot  [0.18416285514831543, 0.18494606018066406, 0.18553614616394043]

3: Size=8, Loops=16384
* Integer
cached       [0.10957002639770508, 0.11252093315124512, 0.11768913269042969]
plain        [0.11848998069763184, 0.11958003044128418, 0.1292269229888916]
testbase     [0.26231694221496582, 0.26471304893493652, 0.26625895500183105]
teststr      [0.26410102844238281, 0.2641758918762207, 0.26569199562072754]
testbasenot  [0.26910495758056641, 0.26967120170593262, 0.2741539478302002]

* String
cached       [0.102294921875, 0.10357999801635742, 0.1050269603729248]
teststr      [0.10852217674255371, 0.10861611366271973, 0.1127161979675293]
plain        [0.11173510551452637, 0.11183404922485352, 0.12115597724914551]
testbasenot  [0.16488981246948242, 0.16509699821472168, 0.16648602485656738]
testbase     [0.16622614860534668, 0.16688108444213867, 0.16962814331054688]

4: Size=16, Loops=8192
* Integer
cached       [0.10548806190490723, 0.10568594932556152, 0.10611891746520996]
plain        [0.11526799201965332, 0.1160120964050293, 0.12486004829406738]
teststr      [0.25309896469116211, 0.25549888610839844, 0.25838899612426758]
testbasenot  [0.25410699844360352, 0.27252411842346191, 0.32510590553283691]
testbase     [0.25414609909057617, 0.26968812942504883, 0.27393984794616699]

* String
cached       [0.092885017395019531, 0.096045970916748047, 0.10643196105957031]
teststr      [0.098433017730712891, 0.098783016204833984, 0.10051798820495605]
plain        [0.10081005096435547, 0.10222005844116211, 0.12018895149230957]
testbasenot  [0.15373396873474121, 0.15472292900085449, 0.15676999092102051]
testbase     [0.15490198135375977, 0.15572404861450195, 0.15599799156188965]

5: Size=32, Loops=4096
* Integer
cached       [0.10568094253540039, 0.10743498802185059, 0.1115870475769043]
plain        [0.1163330078125, 0.11633419990539551, 0.12796401977539062]
teststr      [0.25122308731079102, 0.26527810096740723, 0.26579189300537109]
testbase     [0.25309586524963379, 0.25563716888427734, 0.25917816162109375]
testbasenot  [0.25465011596679688, 0.25907588005065918, 0.26110982894897461]

* String
cached       [0.085406064987182617, 0.086378097534179688, 0.08651280403137207]
teststr      [0.092473983764648438, 0.09324193000793457, 0.093439817428588867]
plain        [0.096549034118652344, 0.097501993179321289, 0.10462403297424316]
testbase     [0.14794015884399414, 0.14966106414794922, 0.15016818046569824]
testbasenot  [0.14796280860900879, 0.14940309524536133, 0.15308189392089844]

6: Size=64, Loops=2048
* Integer
cached       [0.10838603973388672, 0.1089630126953125, 0.11129999160766602]
plain        [0.11764693260192871, 0.11851096153259277, 0.12583494186401367]
teststr      [0.2550208568572998, 0.25540995597839355, 0.26316595077514648]
testbase     [0.25723910331726074, 0.25930881500244141, 0.26207089424133301]
testbasenot  [0.25864100456237793, 0.25901007652282715, 0.26875495910644531]

* String
cached       [0.086635112762451172, 0.087384939193725586, 0.099885940551757812]
plain        [0.096493959426879883, 0.12469196319580078, 0.13684391975402832]
teststr      [0.096681118011474609, 0.098448991775512695, 0.10569310188293457]
testbase     [0.14573216438293457, 0.14696693420410156, 0.14700508117675781]
testbasenot  [0.14776277542114258, 0.14852094650268555, 0.15462112426757812]

7: Size=128, Loops=1024
* Integer
cached       [0.10915207862854004, 0.11011981964111328, 0.1127631664276123]
plain        [0.11721491813659668, 0.11830401420593262, 0.1254270076751709]
testbase     [0.25789499282836914, 0.26130795478820801, 0.26179313659667969]
teststr      [0.25840306282043457, 0.25889492034912109, 0.26300287246704102]
testbasenot  [0.26443600654602051, 0.26498103141784668, 0.26691412925720215]

* String
cached       [0.083537101745605469, 0.084954023361206055, 0.086431980133056641]
teststr      [0.091158866882324219, 0.09123992919921875, 0.091590166091918945]
plain        [0.091225862503051758, 0.092115163803100586, 0.099261045455932617]
testbase     [0.14569401741027832, 0.14622306823730469, 0.14650607109069824]
testbasenot  [0.14774990081787109, 0.14930200576782227, 0.15020990371704102]

8: Size=256, Loops=512
* Integer
cached       [0.10824894905090332, 0.10865211486816406, 0.10895800590515137]
plain        [0.11750102043151855, 0.12690877914428711, 0.12890195846557617]
teststr      [0.25457501411437988, 0.25542402267456055, 0.25692200660705566]
testbasenot  [0.25513482093811035, 0.25664496421813965, 0.25999689102172852]
testbase     [0.25680398941040039, 0.25924396514892578, 0.26179695129394531]

* String
cached       [0.080662012100219727, 0.081827878952026367, 0.081900119781494141]
teststr      [0.089673995971679688, 0.097939014434814453, 0.15471792221069336]
plain        [0.094327926635742188, 0.095342159271240234, 0.097375154495239258]
testbasenot  [0.14262199401855469, 0.14278602600097656, 0.14302182197570801]
testbase     [0.14464497566223145, 0.14674210548400879, 0.16207790374755859]

9: Size=512, Loops=256
* Integer
cached       [0.10789299011230469, 0.1092069149017334, 0.110015869140625]
plain        [0.11702799797058105, 0.1181950569152832, 0.12698101997375488]
testbase     [0.25504207611083984, 0.25520896911621094, 0.25734806060791016]
testbasenot  [0.25715017318725586, 0.25747489929199219, 0.25850796699523926]
teststr      [0.25783085823059082, 0.25882315635681152, 0.26154208183288574]

* String
cached       [0.078849077224731445, 0.079813003540039062, 0.084489107131958008]
teststr      [0.086745977401733398, 0.087059974670410156, 0.087485074996948242]
plain        [0.088322877883911133, 0.088804960250854492, 0.097378969192504883]
testbasenot  [0.14128994941711426, 0.14266705513000488, 0.1427910327911377]
testbase     [0.14152097702026367, 0.14231991767883301, 0.14392399787902832]

10: Size=1024, Loops=128
* Integer
cached       [0.10892415046691895, 0.11003899574279785, 0.11008000373840332]
plain        [0.1192779541015625, 0.12048506736755371, 0.12956619262695312]
teststr      [0.25335502624511719, 0.25642204284667969, 0.25892996788024902]
testbase     [0.25525593757629395, 0.25550699234008789, 0.25794696807861328]
testbasenot  [0.25932693481445312, 0.25960803031921387, 0.26134610176086426]

* String
cached       [0.078451156616210938, 0.080369949340820312, 0.080511093139648438]
teststr      [0.084844112396240234, 0.085949897766113281, 0.096578836441040039]
plain        [0.086302042007446289, 0.087638139724731445, 0.096364974975585938]
testbase     [0.14068913459777832, 0.14274501800537109, 0.15559101104736328]
testbasenot  [0.14075493812561035, 0.15553092956542969, 0.19578790664672852]    

这些时序是在相当旧的32位单核2GHz机器上进行的,在Debian衍生的Linux上运行2GB内存。我使用Python 2.6.6和Python 3.6.0。您的结果可能会有所不;)无论如何,这些结果只能用作粗略指南。 1: Size=2, Loops=65536 * Integer plain [0.2957206170030986, 0.2959696320031071, 0.2991539639988332] cached [0.3058611470005417, 0.30598287599787, 0.3073535650000849] testbase [0.38803433800057974, 0.39307209699836676, 0.393392562000372] testbasenot [0.3888578799997049, 0.3951267439988442, 0.42909636100011994] teststr [0.41290506400036975, 0.41541150199918775, 0.4488242949992127] * String testbase [0.23906823500146857, 0.23946705200069118, 0.24624350399972172] testbasenot [0.24037985899849446, 0.24200722000023234, 0.2462738950016501] plain [0.25742501500280923, 0.2644229819998145, 0.26711930600140477] teststr [0.2635171010006161, 0.3559218000009423, 0.3784064870014845] cached [0.2687887559986848, 0.2711959320004098, 0.38138879500183975] 2: Size=4, Loops=32768 * Integer cached [0.21332427200104576, 0.21363574399947538, 0.21528891600246425] plain [0.22395663199858973, 0.22762144099760917, 0.23422862100051134] testbasenot [0.31939790100295795, 0.32413787499899627, 0.32422161499926005] testbase [0.3209382370005187, 0.3213516770010756, 0.3215230670029996] teststr [0.3372085839982901, 0.33786465500088525, 0.33847540900023887] * String testbasenot [0.17031173299983493, 0.17143720199965173, 0.17724975699820789] testbase [0.170390128998406, 0.17118954800025676, 0.18865150499914307] cached [0.18190538799899514, 0.18262020299880533, 0.183105569001782] plain [0.18666503399799694, 0.18781541300268145, 0.1955128590016102] teststr [0.18973677000030875, 0.19112570400102413, 0.19168143299975782] 3: Size=8, Loops=16384 * Integer cached [0.17012267099926248, 0.18160372200145503, 0.2275817529989581] plain [0.1890079689983395, 0.1963043950017891, 0.2016476179996971] testbasenot [0.28168991999700665, 0.2821743839995179, 0.286649605997809] testbase [0.28295213199817226, 0.28760008400058723, 0.2906435440017958] teststr [0.2958552290001535, 0.2989299110013235, 0.31747390199961956] * String testbase [0.13354753000021446, 0.13377505199969164, 0.14039257600234123] cached [0.1352838150014577, 0.1353432000032626, 0.13798289999976987] testbasenot [0.14252334699995117, 0.14301740500013693, 0.1445914210016781] plain [0.15130633899752866, 0.15166569000211894, 0.1616801599993778] teststr [0.15267008800219628, 0.1545946529986395, 0.15590016200076207] 4: Size=16, Loops=8192 * Integer cached [0.144755126999371, 0.14782401300180936, 0.1484048439997423] plain [0.1726092749995587, 0.1740606339990336, 0.1815100200001325] testbase [0.26685525399807375, 0.27029573199979495, 0.2716258750006091] testbasenot [0.2702714350016322, 0.2723204169997189, 0.27288546099953237] teststr [0.28515160999813816, 0.28523068700087606, 0.2878553769987775] * String cached [0.11515368599793874, 0.11579233700103941, 0.11688366999806021] testbase [0.12178990400207113, 0.13090817400006927, 0.13304468899877975] testbasenot [0.13121789299839293, 0.14976675499929115, 0.1521548589989834] teststr [0.13410512400150765, 0.1354981399999815, 0.147247362001508] plain [0.13691626099898713, 0.1384456069972657, 0.1426525679999031] 5: Size=32, Loops=4096 * Integer cached [0.13246865899782279, 0.13320018100057496, 0.134628559997509] plain [0.1636957459995756, 0.16763203899972723, 0.1752369269997871] testbase [0.26010187700012466, 0.2606812570011243, 0.2647345440018398] testbasenot [0.2620696090016281, 0.26230394700178294, 0.26258907899682526] teststr [0.27685887300322065, 0.2787095199964824, 0.28293989099984174] * String cached [0.10246079200078384, 0.10416977099885116, 0.10755630499988911] testbasenot [0.10829716499938513, 0.10918466699877172, 0.10935586699997657] testbase [0.11739019699962228, 0.11808202800239087, 0.11899654000080773] plain [0.12601002500014147, 0.12718953500007046, 0.13454839599944535] teststr [0.13366336599938222, 0.13407608800116577, 0.13510101700012456] 6: Size=64, Loops=2048 * Integer cached [0.12591946799875586, 0.127094235002005, 0.13223557899982552] plain [0.160616523000499, 0.16232994500023779, 0.1691623620026803] testbase [0.2534341589998803, 0.2556092949998856, 0.2571690379991196] testbasenot [0.2560774869998568, 0.2574564010028553, 0.2606996459981019] teststr [0.268248238000524, 0.2702014210008201, 0.27107579600124154] * String cached [0.09791737100022146, 0.09819723300097394, 0.10752435399990645] testbasenot [0.1057888709983672, 0.10588572099732119, 0.16173565400094958] testbase [0.10636284599968349, 0.1179599219976808, 0.12130766799964476] plain [0.12285572399923694, 0.12589510299949325, 0.13114397300159908] teststr [0.13122114399811835, 0.13273253399893292, 0.14575592999972287] 7: Size=128, Loops=1024 * Integer cached [0.12404713899741182, 0.12496110600113752, 0.12496385000122245] plain [0.15980284800025402, 0.16046370399999432, 0.16711239899814245] testbasenot [0.25531527800194453, 0.25563639699976193, 0.2586420219995489] testbase [0.25544935799916857, 0.2558138679996773, 0.257172014000389] teststr [0.2699256220003008, 0.2712909309993847, 0.27702098800000385] * String cached [0.09376715399776003, 0.09393715400074143, 0.09975314399707713] testbasenot [0.10510071799944853, 0.10511873200084665, 0.10523289399861824] testbase [0.11240010600158712, 0.11325187799957348, 0.11632439300228725] plain [0.12139380200096639, 0.12202585699924384, 0.1315958569975919] teststr [0.12834531499902369, 0.12949470400053542, 0.12955383699954837] 8: Size=256, Loops=512 * Integer cached [0.12225364700134378, 0.12283446399669629, 0.1285843859986926] plain [0.15971405900199898, 0.16198832800000673, 0.16777605400056927] testbase [0.2507534860014857, 0.2527904779999517, 0.25378678199922433] testbasenot [0.25323686200135853, 0.2547167230004561, 0.25919888999851537] teststr [0.2652072370001406, 0.2658402630004275, 0.2674206650008273] * String cached [0.0906629850032914, 0.0985801380011253, 0.09929232800277532] testbase [0.10155730300175492, 0.1042869699995208, 0.11276149599871133] testbasenot [0.10197166099897004, 0.11451221999959671, 0.15595895300066331] plain [0.11898361400017166, 0.12018223199993372, 0.12760113599870238] teststr [0.12645652200080804, 0.12671815700014122, 0.14095144699967932] 9: Size=512, Loops=256 * Integer cached [0.12672984500022721, 0.1462409830019169, 0.2653043659993273] plain [0.161721200998727, 0.17296033000093303, 0.19699998799842433] testbase [0.25432757399903494, 0.25851125400004094, 0.258548003002943] testbasenot [0.25619441399976495, 0.25656893900304567, 0.25998359599907417] teststr [0.2719232039999042, 0.2744571339972026, 0.2751794379983039] * String cached [0.08841608199873008, 0.08848714099804056, 0.09124958899701596] testbasenot [0.09962382599769626, 0.10016373899998143, 0.10028601600060938] testbase [0.10713129000214394, 0.10752918499929365, 0.10952026399900205] plain [0.1163020489984774, 0.12190789400119684, 0.1264930679972167] teststr [0.1242994140011433, 0.12458201900153654, 0.12523995000083232] 10: Size=1024, Loops=128 * Integer cached [0.12827690600170172, 0.1294701549995807, 0.13387694999983069] plain [0.16636216699771467, 0.16866590399877168, 0.17549873600000865] testbasenot [0.25435296399882645, 0.25515673799964134, 0.2605281959986314] testbase [0.26351416900070035, 0.26398584699927596, 0.2651360300005763] teststr [0.26816077799958293, 0.26908816800278146, 0.2715630999991845] * String cached [0.08827024300262565, 0.09090095799911069, 0.09729095900183893] testbase [0.10063145499952952, 0.1010660120009561, 0.10904535399822635] testbasenot [0.10313185999984853, 0.11444468399713514, 0.14796407999892836] plain [0.11569941500056302, 0.11579339799936861, 0.12615105800068704] teststr [0.12353994099976262, 0.12515813500067452, 0.13752399999793852] 可以很好地计算我们想要的时间,但它无法控制其他想要使用CPU的进程。

答案 1 :(得分:1)

    import time
    string = 'string'
    start_time = time.time()

    for i in range (100000):
        if isinstance(string,basestring):
            continue

    end_time = time.time()
    print (end_time - start_time)
    start_time = time.time()
    for i in range (100000):
        str(string)

    end_time = time.time()
    print (end_time - start_time)

    start_time = time.time()
    int = 9
    for i in range (1000000):
        str(int)

    end_time = time.time()
    print (end_time - start_time)
    #0.031
    #0.016
    #0.27999

在这些测试用例中,与使用条件语句相比,它只是执行str(字符串)的两倍。