更新:我设法在Jeremy的功能帮助下修复了这个问题,该功能将我的数据集分成了50个块。我已经发布了最终答案。
我有以下代码我想将数组分成块的原因是我试图使用一次只允许50个请求的api。我也是试图转向python的java开发人员。我想要做的是将阵列分成50个块并将它们提供给api。
我有一个文本文件,其中包含很长的ID列表,我并根据我在读取的ID构建URL。
import simplejson as json
import sys
import urllib
import traceback, csv, string
# "base" API URL
URL_BASE = 'Some URL'
# set user agent string
urllib.version = "Data Collection Fix it"
page_ids = []
def divide_list(list_, n):
for i in range(0, len(list_), n):
yield list_[i:i + n]
def issue_query():
iFile = open('ReadFromThisFile.txt', "r")
lines = iFile.readlines()
#print len(lines)
for line in lines:
ids = string.split(line)
ids = ids[0]
page_ids.append(ids)
url = URL_BASE
indicies = range(len(page_ids))
File = open("WriteToThisFile.csv", "w")
for indicies in divide_list(page_ids, 50):
count = 0
fiftyIds =[]
url = URL_BASE
for id in indicies:
str(id).strip
url += str(id) + '|'
print url
fiftyIds.append(str(id))
count += 1
print count
rv = urllib.urlopen(url)
j = rv.read().decode("utf-8")
#sys.stderr.write(j + "\n")
data = json.loads(j)
for id in fiftyIds:
try:
s = int(data["query"]["pages"][id]["revisions"][0]["size"])
sys.stderr.write("%d\t%d\n" % (int(id), s))
File.write("%d\t%d\n" % (int(id), s))
#print ("%d\t%d\n" % (int(id), s))
# do something interesting with id and s
except Exception, e:
traceback.print_exc()
File.close()
iFile.close()
issue_query()
我知道许多经验python开发人员可能会因为问这样一个简单的问题而给我负面的观点,但我在谷歌或这里找不到任何好的例子。如果我重复了一个问题,请为任何麻烦感到抱歉。
谢谢,
答案 0 :(得分:3)
可能有一个内置函数可以做到这一点,但我想不到它。
#!/usr/bin/env python2.7
def divide_list(list_, n):
"""Produces an iterator over subsections of maximum length n of the list."""
for i in range(0, len(list_), n):
yield list_[i:i + n]
使用示例:
print(list(divide_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 3)))
# prints: [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11]]
使用它来生成URL,例如:
BASE_URL = "http://example.com/blah?ids="
page_ids = range(0, 123)
for indices in divide_list(page_ids, 50):
url = URL_BASE + "|".join(str(i).strip() for i in indices)
# then do something with url...
print(url)
# prints:
# http://example.com/blah?ids=0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49
# http://example.com/blah?ids=50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99
# http://example.com/blah?ids=100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122
答案 1 :(得分:3)
Jeremy's answer的生成器版本:
def divide_list(list_, n):
for i in range(0, len(list_), n):
yield list_[i:i + n]
for chunk in divide_list([1,2,3,4,5], 2):
print chunk
答案 2 :(得分:3)
itertools documentation中有一个食谱(这真的值得通读,只是让你知道什么时候你需要它 - 而你将需要它)
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
答案 3 :(得分:0)
我想我不应该更新问题的原始帖子,而应该回答问题。希望它不会令人困惑,我已在问题部分提出更新评论,告知问题已经解决,以及我如何在Jeremy Banks功能的帮助下解决它
import simplejson as json
import sys
import urllib
import traceback, csv, string
# "base" API URL
URL_BASE = 'Some URL'
# set user agent string
urllib.version = "Data Collection Fix it"
page_ids = []
def divide_list(list_, n):
for i in range(0, len(list_), n):
yield list_[i:i + n]
def issue_query():
iFile = open('ReadFromThisFile.txt', "r")
lines = iFile.readlines()
#print len(lines)
for line in lines:
ids = string.split(line)
ids = ids[0]
page_ids.append(ids)
url = URL_BASE
indicies = range(len(page_ids))
File = open("WriteToThisFile.csv", "w")
for indicies in divide_list(page_ids, 50):
count = 0
fiftyIds =[]
url = URL_BASE
for id in indicies:
str(id).strip
url += str(id) + '|'
print url
fiftyIds.append(str(id))
count += 1
print count
rv = urllib.urlopen(url)
j = rv.read().decode("utf-8")
#sys.stderr.write(j + "\n")
data = json.loads(j)
for id in fiftyIds:
try:
s = int(data["query"]["pages"][id]["revisions"][0]["size"])
sys.stderr.write("%d\t%d\n" % (int(id), s))
File.write("%d\t%d\n" % (int(id), s))
#print ("%d\t%d\n" % (int(id), s))
# do something interesting with id and s
except Exception, e:
traceback.print_exc()
File.close()
iFile.close()
issue_query()