I'm working with twitter ids which are strings because they are so huge.
Twitter's api has a "Since_id" and I want to search tweets since the earliest tweet in a list.
For example:
tweet_ids = [u'1003659997241401843', u'1003659997241401234234', u'100365999724140136236'] # etc
since_id = min(tweet_ids)
So far min(tweet_ids)
works but I want to understand why it works because I want to know if it is just by chance that it worked on the few samples I gave it, or if it is guaranteed to always work.
Edit: To clarify I need to get the lowest tweet id. How do I get the lowest tweet id if they are strings that are > 2^32-1 and therefore can't be represented as integers in python 2.7 on a 32 bit machine.
I am using python 2.7 if that matters
答案 0 :(得分:0)
Python will compare these strings exactly as it compares any other strings; that is, it will compare them lexicographically.
Thus, it will put 12
before 2
, which may be undesirable for you.
Here's a function that will compute the numerical minimum of strings representing integers for you.
# A is an iterable of strings representing integers.
def numerical_min(A):
cur_min = A[0]
for x in A[1:]:
if len(x) < len(cur_min):
cur_min = x
continue
if len(x) > len(cur_min):
continue
for m,n in zip(x, cur_min):
if int(m) < int(n):
cur_min = x
break
return cur_min
答案 1 :(得分:0)
From the Python Documentation, it implies that all Strings, including your case where the strings are large sequences of digits, are compared lexicographically.
2
is less than then "greater integer" string 100
in this case."-1"
is greater than "99"
when compared this way because the minus hyphen is lexicographically greater than all digits."2"
and "02"
aren't necessarily equal in terms of string comparison. "02"
is less than "2"
string-wise because of the leading zero.It is better to convert the str into a long int, and then compare it. As in
tweet_ids = [long('1003659997241401843'), long('1003659997241401234234'), long('100365999724140136236')]
since_id = min(tweet_ids)
Since JSON does not allow 70-bit long ints, convert the smallest int back into a str. Replace the since_id
line with
since_id = min(tweet_ids, key=int)