我有一个字符串列表,我想知道列表中最长字符串的长度。有一个简单的方法来获得它吗?
更一般地说,我经常想知道数据框中列的最长字符串的长度。我只需要了解数据是什么样的,所以我希望有一个方便的方法,如df['column'].maxlength
,而不是去for loop
来获取数字。
答案 0 :(得分:5)
这是比较:
#!/usr/bin/python
import cProfile
from timeit import Timer
from faker import Faker
def longest1(lists):
return max(len(s) for s in lists)
def longest2(lists):
return len(max(lists, key=len))
def longest3(lists):
return len(sorted(lists, key=len)[-1])
s = Faker()
seq = [ s.word() for x in range(100) ]
func = [ longest1, longest2, longest3 ]
for f in func:
t = Timer(lambda: f(seq))
print f.__name__, cProfile.run('t.timeit(number=1000)')
结果:longest2是最快的
输出:
longest1 204011 function calls in 0.046 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.046 0.046 <string>:1(<module>)
1000 0.000 0.000 0.045 0.000 long.py:22(<lambda>)
1000 0.001 0.000 0.045 0.000 long.py:7(longest1)
101000 0.025 0.000 0.031 0.000 long.py:8(<genexpr>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.046 0.046 timeit.py:178(timeit)
1 0.000 0.000 0.046 0.046 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
100000 0.007 0.000 0.007 0.000 {len}
1000 0.013 0.000 0.044 0.000 {max}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
longest2 4011 function calls in 0.011 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.011 0.011 <string>:1(<module>)
1000 0.001 0.000 0.010 0.000 long.py:10(longest2)
1000 0.000 0.000 0.011 0.000 long.py:22(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.011 0.011 timeit.py:178(timeit)
1 0.000 0.000 0.011 0.011 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1000 0.000 0.000 0.000 0.000 {len}
1000 0.010 0.000 0.010 0.000 {max}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
longest3 4011 function calls in 0.031 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.031 0.031 <string>:1(<module>)
1000 0.001 0.000 0.031 0.000 long.py:13(longest3)
1000 0.000 0.000 0.031 0.000 long.py:22(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.031 0.031 timeit.py:178(timeit)
1 0.000 0.000 0.031 0.031 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1000 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 0.029 0.000 0.029 0.000 {sorted}
2 0.000 0.000 0.000 0.000 {time.time}
None
答案 1 :(得分:2)
def longest(strings): return max(strings, key=len)
您可以找到内置max
函数here的文档。该密钥用于从每个字符串中提取值。
答案 2 :(得分:1)
def longest_string(string_list):
return max(len(s) for s in string_list)
示例运行:
l = ['abc', 'de', 'longest']
longest_string(l)
# 7
答案 3 :(得分:0)
使用pandas vectorised str
方法:
In [6]:
import pandas as pd
df = pd.DataFrame({'a':['asdsadasasdss','asdasdasdasasasdasdasd','asdsasdas']})
df
Out[6]:
a
0 asdsadasasdss
1 asdasdasdasasasdasdasd
2 asdsasdas
[3 rows x 1 columns]
In [8]:
df.a.str.len()
Out[8]:
0 13
1 22
2 9
Name: a, dtype: int64
In [9]:
df[df.a.str.len() ==max(df.a.str.len())]
Out[9]:
a
1 asdasdasdasasasdasdasd
[1 rows x 1 columns]