Question

我正在从Vizier下载星型目录（使用astroquery）。有关的目录不包括星星名称，因此我可以通过查询我的Vizier目录中每颗星星的1 arcsec内的所有SIMBAD星星，从SIMBAD（也使用astroquery）中获取这些星星。

然后我需要通过ra / dec坐标进行匹配。但是Vizier和SIMBAD的坐标可能都不太准确，因此我无法进行精确匹配。

我当前的解决方案是指定一个公差，对于每个Vizier星，调用下面的函数以遍历SIMBAD星，测试坐标是否在指定公差内。仔细检查一下，因为恒星可能非常靠近，所以我还要检查恒星的大小是否在0.1 mag之内。

这一切都有效，但是对于大约2,000颗星的Vizier目录和类似大小的SIMBAD数据集，运行需要2分钟以上。我正在寻找加快这一步的想法。

    def get_simbad_name(self, vizier_star, simbad_stars, tolerance):
        """
        Searches simbad_stars to find the SIMBAD name of the star 
        referenced in vizier_star.

        A match is deemed to exist if a star in simbad_stars has both 
        ra and dec +/- tolerance of the target vizier_star and if their V 
        magnitudes, rounded to one decimal place, also match.

        Parameters
        ==========
        vizier_star : astropy.table.Row
            Row of results from Vizier query, corresponding to a star in a 
            Vizier catalog. Columns of interest to this function are:

            '_RAJ2000' : float [Right ascension in decimal degrees]
            '_DEJ2000' : float [Declination in decimal degrees]
            'Vmag' : float [V magnitude (to 3 decimal places)]

        simbad_stars : list of dict
            List of star data derived from a Vizier query. Keys of interest 
            to this function are:

            'ra' : float [Right ascension in decimal degrees (ICRS/J2000)]
            'dec' : float [Declination in decimal degrees (ICRS/J2000)]
            'Vmag' : float [V magnitude (to 3 decimal places)]
            'name' : str [SIMBAD primary id of star]

        tolerance : float
            The tolerance, in degrees, to be used in determining whether 
            the ra/dec coordinates match.

        Returns
        =======
        name : str
            If match then returns the SIMBAD name. If no match returns 
            an empty string.

        Notes
        =====
        simbad_stars are not all guaranteed to have Vmag. Any that don't are 
        ignored.
        """
        for item in simbad_stars:
            try:
                approx_Vmag = round(item['Vmag'],1)
            except KeyError:
                continue
            if ((vizier_star['_RAJ2000'] > item['ra'] - tolerance) and
                (vizier_star['_RAJ2000'] < item['ra'] + tolerance) and
                (vizier_star['_DEJ2000'] > item['dec'] - tolerance) and
                (vizier_star['_DEJ2000'] < item['dec'] + tolerance) and
                (round(vizier_star['Vmag'],1) == approx_Vmag)):
                return item['name']
        return ''

评论后还有其他想法：

比赛成功率非常高（大约99％），因此在几乎所有情况下循环都会提前退出。不必迭代所有simbad_stars。

如果我按ra对simbad_stars进行预排序并使用二进制印章来获取从何处开始循环的索引，则可以进一步改进。

Answer 1

该问题似乎由于其询问方式而被关闭，但是有两个有用的答案：

（1）进行位置交叉匹配，请参见https://docs.astropy.org/en/stable/coordinates/matchsep.html

（2）对于您在此处所做的一般情况，应该使用向量化操作，而不是遍历源代码。

Answer 2

通过对simbad_stars进行预排序并使用bisect_left和bisect_right定义其中的开始和结束索引，我设法将速度提高了20倍。

如果有人感兴趣，我可以发布代码（这比原始代码长很多，因为它是使用自定义类的更通用的解决方案）。

我如何加快此功能

2 个答案: