我的笔记本电脑上用于4.2 GB输入文件的此代码的运行时间为48秒。输入文件以制表符分隔,每个值都显示在引号中。每条记录以换行符结尾,例如'"val1"\t"val2"\t"val3"\t..."valn"\n'
我尝试过使用10个线程进行多处理:一个用于排队,一个用于解析各行并填充一个输出队列,一个用于将输出队列缩减为如下所示的defaultdict,但代码耗时300秒跑步,比以下时间长6倍:
from collections import defaultdict
def get_users(log):
users = defaultdict(int)
f = open(log)
# Read header line
h = f.readline().strip().replace('"', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
for (i, line) in enumerate(f):
if i % 1000000 == 0: print "Line %d" % i # progress notification
l = line.split('\t')
if l[ix_profile] != '"7"': # "7" indicates a bad value
# use list slicing to remove quotes
users[l[ix_user][1:-1]] += 1
f.close()
return users
我已经通过从for循环删除除了print语句之外的所有内容来检查我是不是受I / O限制。该代码在9秒内运行,我将考虑这个代码运行速度的下限。
我有很多这5 GB的文件要处理,所以即使是运行时的一个很小的改进(我知道,我可以删除打印!)会有所帮助。我运行的机器有4个内核,所以我不禁想知道是否有办法让多线程/多进程代码比上面的代码运行得更快。
更新
我重写了多处理代码如下:
from multiprocessing import Pool, cpu_count
from collections import defaultdict
def parse(line, ix_profile=10, ix_user=9):
"""ix_profile and ix_user predetermined; hard-coding for expedience."""
l = line.split('\t')
if l[ix_profile] != '"7"':
return l[ix_user][1:-1]
def get_users_mp():
f = open('20110201.txt')
h = f.readline() # remove header line
pool = Pool(processes=cpu_count())
result_iter = pool.imap_unordered(parse, f, 100)
users = defaultdict(int)
for r in result_iter:
if r is not None:
users[r] += 1
return users
它在26秒内运行,加速1.85倍。不错,但有4个核心,没有我想象的那么多。
答案 0 :(得分:4)
使用正则表达式。
测试确定进程的昂贵部分是对str.split()的调用。可能不得不为每一行构建一个列表和一堆字符串对象是很昂贵的。
首先,您需要构造一个正则表达式来匹配该行。类似的东西:
expression = re.compile(r'("[^"]")\t("[^"]")\t')
如果你调用expression.match(line).groups(),你将把前两列作为两个字符串对象提取出来,你可以直接用它们做逻辑。
现在假设感兴趣的两列是前两列。如果不是,您只需调整正则表达式以匹配正确的列。您的代码会检查标头以查看列的位置。你可以根据它生成正则表达式,但我猜这些列实际上总是位于同一个地方。只需验证它们是否仍在那里并在行上使用正则表达式。
修改强>
来自集合的导入defaultdict 导入重新
def get_users(log):
f = open(log)
# Read header line
h = f.readline().strip().replace('\'', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
assert ix_user < ix_profile
此代码假定用户在个人资料之前
keep_field = r'"([^"]*)"'
此正则表达式将捕获单个列
skip_field = r'"[^"]*"'
此正则表达式将匹配列,但不捕获结果。 (注意缺少括号)
fields = [skip_field] * len(h)
fields[ix_profile] = keep_field
fields[ix_user] = keep_field
为所有字段创建一个列表,只保留我们关心的字段
del fields[max(ix_profile, ix_user)+1:]
在我们关心的字段之后删除所有字段(它们需要时间匹配,我们不关心它们)
regex = re.compile(r"\t".join(fields))
实际上产生正则表达式。
users = defaultdict(int)
for line in f:
user, profile = regex.match(line).groups()
拉出两个值,并执行逻辑
if profile != "7": # "7" indicates a bad value
# use list slicing to remove quotes
users[user] += 1
f.close()
return users
答案 1 :(得分:2)
如果你正在运行unix或cygwin,下面的小脚本会产生用户id的频率,其中profile!= 7.应该很快。
使用awk更新以计算用户ID
#!/bin/bash
FILENAME="test.txt"
IX_PROFILE=`head -1 ${FILENAME} | sed -e 's/\t/\n/g' | nl -w 1 | grep profile.type | cut -f1`
IX_USER=`head -1 ${FILENAME} | sed -e 's/\t/\n/g' | nl -w 1 | grep profile.id | cut -f1`
# Just the userids
# sed 1d ${FILENAME} | cut -f${IX_PROFILE},${IX_USER} | grep -v \"7\" | cut -f2
# userids counted:
# sed 1d ${FILENAME} | cut -f${IX_PROFILE},${IX_USER} | grep -v \"7\" | cut -f2 | sort | uniq -c
# Count using awk..?
sed 1d ${FILENAME} | cut -f${IX_PROFILE},${IX_USER} | grep -v \"7\" | cut -f2 | awk '{ count[$1]++; } END { for (x in count) { print x "\t" count[x] } }'
答案 2 :(得分:1)
看到您的日志文件以制表符分隔,您可以使用csv
模块 - 带有dialect='excel-tab'
参数 - 以获得良好的性能和可读性提升。当然,如果你必须使用Python而不是更快的控制台命令。
答案 3 :(得分:1)
如果使用正则表达式可以通过忽略不需要拆分的行的尾部来加速它,那么更简单的方法可能会有所帮助:
[snip)
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
maxsplits = max(ix_profile, ix_user) + 1 #### new statement ####
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
for (i, line) in enumerate(f):
if i % 1000000 == 0: print "Line %d" % i # progress notification
l = line.split('\t', maxsplits) #### changed line ####
[snip]
请对您的数据进行调整。
答案 4 :(得分:0)
也许你可以做到
users[l[ix_user]] += 1
而不是
users[l[ix_user][1:-1]] += 1
并删除末尾dict上的引号。应该节省一些时间。
对于多线程方法:尝试每次从文件中读取几千行,并将这几千行传递给要处理的线程。逐行进行似乎是太多的开销。
或者阅读this article中的解决方案,因为他似乎正在做一些与您尝试做的非常类似的事情。
答案 5 :(得分:0)
除了这一点之外,这可能略有不同,但Python在处理多个线程时有一些非常奇怪的行为(当线程不是IO绑定时尤其糟糕)。更具体地说,它有时比单线程运行慢得多。这是由于Python中的全局解释器锁(GIL)被用于确保在任何给定时间在Python解释器中只能执行一个以上线程的方式。
由于在任何给定时间只有一个线程可以实际使用解释器的约束,因此您拥有多个核心这一事实对您无济于事。实际上,由于尝试获取GIL的两个线程之间的某些病理交互,实际上可能会使事情变得更糟。如果你想坚持Python,你有两个选择之一:
print
语句。如果你想了解更多关于这个神奇的Python的信息,请在这个页面上查看与GIL相关的会谈:http://www.dabeaz.com/talks.html。
答案 6 :(得分:0)
我意识到我与Winston Ewert几乎完全相同:构建一个正则表达式。
但是我的正则表达式:
完成ix_profile < ix_user
以及ix_profile > ix_user
正则表达式仅捕获用户的列:如果此列中存在“7”,则配置文件的列与子模式'"(?!7")[^\t\r\n"]*"'
匹配,该子模式不匹配;所以我们只获得了唯一定义了
此外,我测试了几种匹配和提取算法:
1)使用 re.finditer()
2) re.match(),正则表达式匹配 40个字段
3) re.match()且正则表达式仅匹配 max(ix_profile,ix_user)+ 1个字段
4)喜欢3 ,但使用简单字典而不是defaultdict实例
要测量时间,我的代码会根据您提供的有关其内容的信息创建一个文件。
我在4个代码中测试了以下4个函数:
def get_users_short_1(log):
users_short = defaultdict(int)
f = open(log)
# Read header line
h = f.readline().strip().replace('"', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
glo = 40*['[^\t]*']
glo[ix_profile] = '"(?!7")[^\t"]+"'
glo[ix_user] = '"([^\t"]*)"'
glo[39] = '"[^\t\r\n]*"'
regx = re.compile('^'+'\t'.join(glo),re.MULTILINE)
content = f.read()
for mat in regx.finditer(content):
users_short[mat.group(1)] += 1
f.close()
return users_short
def get_users_short_2(log):
users_short = defaultdict(int)
f = open(log)
# Read header line
h = f.readline().strip().replace('"', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
glo = 40*['[^\t]*']
glo[ix_profile] = '"(?!7")[^\t"]*"'
glo[ix_user] = '"([^\t"]*)"'
regx = re.compile('\t'.join(glo))
for line in f:
gugu = regx.match(line)
if gugu:
users_short[gugu.group(1)] += 1
f.close()
return users_short
def get_users_short_3(log):
users_short = defaultdict(int)
f = open(log)
# Read header line
h = f.readline().strip().replace('"', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
glo = (max(ix_profile,ix_user) + 1) * ['[^\t]*']
glo[ix_profile] = '"(?!7")[^\t"]*"'
glo[ix_user] = '"([^\t"]*)"'
regx = re.compile('\t'.join(glo))
for line in f:
gugu = regx.match(line)
if gugu:
users_short[gugu.group(1)] += 1
f.close()
return users_short
完整的代码4,似乎是最快的:
import re
from random import choice,randint,sample
import csv
import random
from time import clock
choi = 1
if choi:
ntot = 1000
chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
def ry(a=30,b=80,chars=chars,nom='abcdefghijklmnopqrstuvwxyz'):
if a==30:
return ''.join(choice(chars) for i in xrange(randint(30,80)))
else:
return ''.join(choice(nom) for i in xrange(randint(8,12)))
num = sample(xrange(1000),200)
num.sort()
print 'num==',num
several = [e//3 for e in xrange(0,800,7) if e//3 not in num]
print
print 'several==',several
with open('biggy.txt','w') as f:
head = ('aaa','bbb','ccc','ddd','profile.id','fff','ggg','hhhh','profile.type','iiii',
'jjj','kkkk','lll','mmm','nnn','ooo','ppp','qq','rr','ss',
'tt','uu','vv','ww','xx','yy','zz','razr','fgh','ty',
'kfgh','zer','sdfs','fghf','dfdf','zerzre','jkljkl','vbcvb','kljlk','dhhdh')
f.write('\t'.join(head)+'\n')
for i in xrange(1000):
li = [ ry(a=8).join('""') if n==4 else ry().join('""')
for n in xrange(40) ]
if i in num:
li[4] = '@#~&=*;'
li[8] = '"7"'
if i in several:
li[4] = '"BRAD"'
f.write('\t'.join(li)+'\n')
from collections import defaultdict
def get_users(log):
users = defaultdict(int)
f = open(log)
# Read header line
h = f.readline().strip().replace('"', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
for (i, line) in enumerate(f):
#if i % 1000000 == 0: print "Line %d" % i # progress notification
l = line.split('\t')
if l[ix_profile] != '"7"': # "7" indicates a bad value
# use list slicing to remove quotes
users[l[ix_user][1:-1]] += 1
f.close()
return users
def get_users_short_4(log):
users_short = {}
f = open(log)
# Read header line
h = f.readline().strip().replace('"', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
glo = (max(ix_profile,ix_user) + 1) * ['[^\t]*']
glo[ix_profile] = '"(?!7")[^\t"]*"'
glo[ix_user] = '"([^\t"]*)"'
regx = re.compile('\t'.join(glo))
for line in f:
gugu = regx.match(line)
if gugu:
gugugroup = gugu.group(1)
if gugugroup in users_short:
users_short[gugugroup] += 1
else:
users_short[gugugroup] = 1
f.close()
return users_short
print '\n\n'
te = clock()
USERS = get_users('biggy.txt')
t1 = clock()-te
te = clock()
USERS_short_4 = get_users_short_4('biggy.txt')
t2 = clock()-te
if choi:
print '\nlen(num)==',len(num),' : number of lines with ix_profile==\'"7"\''
print "USERS['BRAD']==",USERS['BRAD']
print 'then :'
print str(ntot)+' lines - '+str(len(num))+' incorrect - '+str(len(several))+\
' identical + 1 user BRAD = '+str(ntot - len(num)-len(several)+1)
print '\nlen(USERS)==',len(USERS)
print 'len(USERS_short_4)==',len(USERS_short_4)
print 'USERS == USERS_short_4 is',USERS == USERS_short_4
print '\n----------------------------------------'
print 'time of get_users() :\n', t1,'\n----------------------------------------'
print 'time of get_users_short_4 :\n', t2,'\n----------------------------------------'
print 'get_users_short_4() / get_users() = '+str(100*t2/t1)+ ' %'
print '----------------------------------------'
此代码4的一个结果是例如:
num== [2, 12, 16, 23, 26, 33, 38, 40, 43, 45, 51, 53, 84, 89, 93, 106, 116, 117, 123, 131, 132, 135, 136, 138, 146, 148, 152, 157, 164, 168, 173, 176, 179, 189, 191, 193, 195, 199, 200, 208, 216, 222, 224, 227, 233, 242, 244, 245, 247, 248, 251, 255, 256, 261, 262, 266, 276, 278, 291, 296, 298, 305, 307, 308, 310, 312, 314, 320, 324, 327, 335, 337, 340, 343, 350, 356, 362, 370, 375, 379, 382, 385, 387, 409, 413, 415, 419, 433, 441, 443, 444, 446, 459, 462, 474, 489, 492, 496, 505, 509, 511, 512, 518, 523, 541, 546, 548, 550, 552, 558, 565, 566, 572, 585, 586, 593, 595, 601, 609, 610, 615, 628, 632, 634, 638, 642, 645, 646, 651, 654, 657, 660, 662, 665, 670, 671, 680, 682, 687, 688, 690, 692, 695, 703, 708, 716, 717, 728, 729, 735, 739, 741, 742, 765, 769, 772, 778, 790, 792, 797, 801, 808, 815, 825, 828, 831, 839, 849, 858, 859, 862, 864, 872, 874, 890, 899, 904, 906, 913, 916, 920, 923, 928, 941, 946, 947, 953, 955, 958, 959, 961, 971, 975, 976, 979, 981, 985, 989, 990, 999]
several== [0, 4, 7, 9, 11, 14, 18, 21, 25, 28, 30, 32, 35, 37, 39, 42, 44, 46, 49, 56, 58, 60, 63, 65, 67, 70, 72, 74, 77, 79, 81, 86, 88, 91, 95, 98, 100, 102, 105, 107, 109, 112, 114, 119, 121, 126, 128, 130, 133, 137, 140, 142, 144, 147, 149, 151, 154, 156, 158, 161, 163, 165, 170, 172, 175, 177, 182, 184, 186, 196, 198, 203, 205, 207, 210, 212, 214, 217, 219, 221, 226, 228, 231, 235, 238, 240, 249, 252, 254, 259, 263]
len(num)== 200 : number of lines with ix_profile=='"7"'
USERS['BRAD']== 91
then :
1000 lines - 200 incorrect - 91 identical + 1 user BRAD = 710
len(USERS)== 710
len(USERS_short_4)== 710
USERS == USERS_short_4 is True
----------------------------------------
time of get_users() :
0.0788686830309
----------------------------------------
time of get_users_short_4 :
0.0462885646081
----------------------------------------
get_users_short_4() / get_users() = 58.690677756 %
----------------------------------------
但结果或多或少是变数。我获得了:
get_users_short_1() / get_users() = 82.957476637 %
get_users_short_1() / get_users() = 82.3987686867 %
get_users_short_1() / get_users() = 90.2949842932 %
get_users_short_1() / get_users() = 78.8063007461 %
get_users_short_1() / get_users() = 90.4743181768 %
get_users_short_1() / get_users() = 81.9635560003 %
get_users_short_1() / get_users() = 83.9418269406 %
get_users_short_1() / get_users() = 89.4344442255 %
get_users_short_2() / get_users() = 80.4891442088 %
get_users_short_2() / get_users() = 69.921943776 %
get_users_short_2() / get_users() = 81.8006709304 %
get_users_short_2() / get_users() = 83.6270772928 %
get_users_short_2() / get_users() = 97.9821084403 %
get_users_short_2() / get_users() = 84.9307558629 %
get_users_short_2() / get_users() = 75.9384820018 %
get_users_short_2() / get_users() = 86.2964748485 %
get_users_short_3() / get_users() = 69.4332754744 %
get_users_short_3() / get_users() = 58.5814726668 %
get_users_short_3() / get_users() = 61.8011476831 %
get_users_short_3() / get_users() = 67.6925083362 %
get_users_short_3() / get_users() = 65.1208124156 %
get_users_short_3() / get_users() = 72.2621727569 %
get_users_short_3() / get_users() = 70.6957501222 %
get_users_short_3() / get_users() = 68.5310031226 %
get_users_short_3() / get_users() = 71.6529128259 %
get_users_short_3() / get_users() = 71.6153554073 %
get_users_short_3() / get_users() = 64.7899044975 %
get_users_short_3() / get_users() = 72.947531363 %
get_users_short_3() / get_users() = 65.6691965629 %
get_users_short_3() / get_users() = 61.5194374401 %
get_users_short_3() / get_users() = 61.8396133666 %
get_users_short_3() / get_users() = 71.5447862466 %
get_users_short_3() / get_users() = 74.6710538858 %
get_users_short_3() / get_users() = 72.9651233485 %
get_users_short_4() / get_users() = 65.5224210767 %
get_users_short_4() / get_users() = 65.9023813161 %
get_users_short_4() / get_users() = 62.8055210129 %
get_users_short_4() / get_users() = 64.9690049062 %
get_users_short_4() / get_users() = 61.9050866134 %
get_users_short_4() / get_users() = 65.8127125992 %
get_users_short_4() / get_users() = 66.8112344201 %
get_users_short_4() / get_users() = 57.865635278 %
get_users_short_4() / get_users() = 62.7937713964 %
get_users_short_4() / get_users() = 66.3440149528 %
get_users_short_4() / get_users() = 66.4429530201 %
get_users_short_4() / get_users() = 66.8692388625 %
get_users_short_4() / get_users() = 66.5949137537 %
get_users_short_4() / get_users() = 69.1708488794 %
get_users_short_4() / get_users() = 59.7129743801 %
get_users_short_4() / get_users() = 59.755297387 %
get_users_short_4() / get_users() = 60.6436352185 %
get_users_short_4() / get_users() = 64.5023727945 %
get_users_short_4() / get_users() = 64.0153937511 %
我想知道你的真实文件中的代码会用一台比我强大的计算机获得什么样的结果。请给我新闻。
使用
def get_users_short_Machin(log):
users_short = defaultdict(int)
f = open(log)
# Read header line
h = f.readline().strip().replace('"', '').split('\t')
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
maxsplits = max(ix_profile, ix_user) + 1
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
for line in f:
#if i % 1000000 == 0: print "Line %d" % i # progress notification
l = line.split('\t', maxsplits)
if l[ix_profile] != '"7"': # "7" indicates a bad value
# use list slicing to remove quotes
users_short[l[ix_user][1:-1]] += 1
f.close()
return users_short
我有
get_users_short_Machin() / get_users() = 60.6771821308 %
get_users_short_Machin() / get_users() = 71.9300992989 %
get_users_short_Machin() / get_users() = 85.1695214715 %
get_users_short_Machin() / get_users() = 72.7722233685 %
get_users_short_Machin() / get_users() = 73.6311173237 %
get_users_short_Machin() / get_users() = 86.0848484053 %
get_users_short_Machin() / get_users() = 75.1661981729 %
get_users_short_Machin() / get_users() = 72.8888452474 %
get_users_short_Machin() / get_users() = 76.7185685993 %
get_users_short_Machin() / get_users() = 82.7007096958 %
get_users_short_Machin() / get_users() = 71.1678957888 %
get_users_short_Machin() / get_users() = 71.9845835126 %
使用简单的词典:
users_short = {}
.......
for line in f:
#if i % 1000000 == 0: print "Line %d" % i # progress notification
l = line.split('\t', maxsplits)
if l[ix_profile] != '"7"': # "7" indicates a bad value
# use list slicing to remove quotes
us = l[ix_user][1:-1]
if us not in users_short:
users_short[us] = 1
else:
users_short[us] += 1
稍微改善了执行的时间,但它仍然高于我上一次的代码4
get_users_short_Machin2() / get_users() = 71.5959919389 %
get_users_short_Machin2() / get_users() = 71.6118864535 %
get_users_short_Machin2() / get_users() = 66.3832514274 %
get_users_short_Machin2() / get_users() = 68.0026407277 %
get_users_short_Machin2() / get_users() = 67.9853921552 %
get_users_short_Machin2() / get_users() = 69.8946203037 %
get_users_short_Machin2() / get_users() = 71.8260030248 %
get_users_short_Machin2() / get_users() = 78.4243267003 %
get_users_short_Machin2() / get_users() = 65.7223734428 %
get_users_short_Machin2() / get_users() = 69.5903935612 %
最快的:
def get_users_short_CSV(log):
users_short = {}
f = open(log,'rb')
rid = csv.reader(f,delimiter='\t')
# Read header line
h = rid.next()
ix_profile = h.index('profile.type')
ix_user = h.index('profile.id')
# If either ix_* is the last field in h, it will include a newline.
# That's fine for now.
glo = (max(ix_profile,ix_user) + 1) * ['[^\t]*']
glo[ix_profile] = '"(?!7")[^\t\r\n"]*"'
glo[ix_user] = '"([^\t\r\n"]*)"'
regx = re.compile('\t'.join(glo))
for line in f:
gugu = regx.match(line)
if gugu:
gugugroup = gugu.group(1)
if gugugroup in users_short:
users_short[gugugroup] += 1
else:
users_short[gugugroup] = 1
f.close()
return users_short
结果
get_users_short_CSV() / get_users() = 31.6443901114 %
get_users_short_CSV() / get_users() = 44.3536176134 %
get_users_short_CSV() / get_users() = 47.2295100511 %
get_users_short_CSV() / get_users() = 45.4912200716 %
get_users_short_CSV() / get_users() = 63.7997241038 %
get_users_short_CSV() / get_users() = 43.5020255488 %
get_users_short_CSV() / get_users() = 40.9188320386 %
get_users_short_CSV() / get_users() = 43.3105062139 %
get_users_short_CSV() / get_users() = 59.9184895288 %
get_users_short_CSV() / get_users() = 40.22047881 %
get_users_short_CSV() / get_users() = 48.3615872543 %
get_users_short_CSV() / get_users() = 47.0374831251 %
get_users_short_CSV() / get_users() = 44.5268626789 %
get_users_short_CSV() / get_users() = 53.1690205938 %
get_users_short_CSV() / get_users() = 43.4022458372 %
我测试了 get_users_short_CSV(),文件中包含10000行而不是1000行:
len(num)== 2000 : number of lines with ix_profile=='"7"'
USERS['BRAD']== 95
then :
10000 lines - 2000 incorrect - 95 identical + 1 user BRAD = 7906
len(USERS)== 7906
len(USERS_short_CSV)== 7906
USERS == USERS_short_CSV is True
----------------------------------------
time of get_users() :
0.794919186656
----------------------------------------
time of get_users_short_CSV :
0.358942826532
----------------------------------------
get_users_short_CSV() / get_users() = 41.5618307521 %
get_users_short_CSV() / get_users() = 42.2769300584 %
get_users_short_CSV() / get_users() = 45.154631132 %
get_users_short_CSV() / get_users() = 44.1596819482 %
get_users_short_CSV() / get_users() = 30.3192350266 %
get_users_short_CSV() / get_users() = 34.4856637748 %
get_users_short_CSV() / get_users() = 43.7461535628 %
get_users_short_CSV() / get_users() = 41.7577246935 %
get_users_short_CSV() / get_users() = 41.9092878608 %
get_users_short_CSV() / get_users() = 44.6772360665 %
get_users_short_CSV() / get_users() = 42.6770989413 %