Question

我有一个数字集，其中包含txt文件中的2375013个唯一数字。数据结构如下所示：

我希望将一行中的数字与另一个数据匹配，以便为我提取所需的数据。所以，我编码如下：

   6 def get_US_users_IDs(filepath, mode):
   7     IDs = []
   8     with open(filepath, mode) as f:
   9         for line in f:
  10             sp = line.strip()
  11             for id in sp:
  12                 IDs.append(id.lower())
  13         return IDs


  75         IDs = "|".join(get_US_users_IDs('/nas/USAuserlist.txt', 'r'))
  76         matcher = re.compile(IDs)
  77         if matcher.match(user_id):
  78             number_of_US_user += 1
  79             text = tweet.split('\t')[3]

但是跑步需要很多时间。有什么想法减少运行时间吗？

Answer 1

我理解的是，文件中有大量的ID，并且您想知道特定的user_id是否在此文件中。

你可以使用python集。

fd = open(filepath, mode);
IDs = set(int(id) for id in fd)
...
if user_id in IDs:
  number_of_US_user += 1
  ...

如何有效地将特定数字与数字集匹配？

1 个答案: