我目前有一份非营利组织和公司名单。我想以计算方式组装他们的EIN。感谢您对如何做到这一点的帮助。
我目前的想法是去指导星网站(http://www.guidestar.org/Home.aspx),如果我可以以某种方式导航到相应的指南星简档页面,请抓住组织的EIN。
然而,当我在指南星页面搜索像“Somerville社区公司”这样的组织时,我注意到有一个通用:http://www.guidestar.org/SearchResults.aspx当我点击实际页面时,它预先假定了EIN的知识其网址(23-7293380)中的数字。
http://www.guidestar.org/organizations/23-7293380/somerville-community-corporation.aspx
如果能获得EIN,我将不胜感激!
更新: 另一种方法是使用citizenaudit.org 但是,再次,网址预先假定了EIN的知识。如何处理这个问题?
答案 0 :(得分:1)
如果您下载并解压缩the link which a-p has provided,则可以执行类似
的操作from collections import defaultdict
import csv
from operator import and_
import re
DATAFILE = "data-download-pub78.txt"
def get_words(s):
return re.findall("[a-z]+", s.lower())
def build_index(items):
word_index = defaultdict(set)
ein_index = {}
for ein, name in items:
for word in get_words(name):
word_index[word].add(name)
ein_index[name] = ein
return word_index, ein_index
with open(DATAFILE, "rb") as inf:
incsv = csv.reader(inf, delimiter="|")
items = (row[:2] for row in incsv if len(row) >= 2)
words, eins = build_index(items)
def find_matches(s):
wordlst = (words[wd] for wd in get_words(s))
charities = reduce(and_, wordlst)
res = [(eins[ch], ch) for ch in charities]
res.sort(key=lambda x: int(x[0]))
return res
def main():
while True:
s = raw_input("Enter all or part of a charity name, or nothing to quit: ").strip()
if s:
charities = find_matches(s)
if charities:
print("{} matches:".format(len(charities)))
for ch in charities:
print("{}: {}".format(*ch))
print("")
else:
print("No matches found.")
else:
break
if __name__=="__main__":
main()
然后像
一样运行Enter all or part of a charity name, or nothing to quit: Somerville Community
5 matches:
042740838: Community Action Agency of Somerville Inc.
222506464: Somerville Community Access Television Inc.
237293380: Somerville Community Corporation Inc.
432083625: Somerville Hispanic Association for Community Development Inc.
743021520: Somerville Community Library Association