Question

考虑我有两个向量

a <- c(1,3,5,7,9, 23,35,36,43)
b <- c(2,4,6,8,10,24, 37, 45)

请注意两者的length不同。

我想根据最接近的距离找到两个矢量之间的差距/差异/序列。

预期输出

请注意，35对NA有36因为37的序列与I am using Scrapy to scrape the items. Here is my pipelines code # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html from sqlalchemy.orm import sessionmaker from . models import F_hunter,db_connect, create_f_hunter_table, \ L_hunter, create_l_hunter_table, \ A_hunter, create_a_hunter_table from scrapy.spiders import Spider from scrapy.exceptions import DropItem ########################################################### class HunterPipelineF(object): """ Hunter pipeline for storing scraped items to database""" def __init__(self): """Initalize datatbase connection and sessionmaker. Creates f_hunter table """ engine = db_connect() create_f_hunter_table(engine) self.Session = sessionmaker(bind=engine) def process_item(self, item, spider): """ Saves items link and summary in database. Method is called for every item pipeline component """ session = self.Session() f = F_hunter(**item) try: session.add(f) session.commit() except: session.rollback() raise finally: session.close() return item ############################################################## class HunterPipelineL(object): """ Hunter pipeline for storing scraped items to database""" def __init__(self): """Initalize datatbase connection and sessionmaker. Creates f_hunter table """ #super(HunterPipelineL, self).__init__() engine = db_connect() create_l_hunter_table(engine) self.Session = sessionmaker(bind=engine) def process_item(self, item, spider): """ Saves items link and summary in database. Method is called for every item pipeline component """ session = self.Session() l = L_hunter(**item) try: session.add(l) session.commit() except: session.rollback() raise finally: session.close() return item ############################################################# class HunterPipelineA(object): """ Hunter pipeline for storing scraped items to database""" def __init__(self): """Initalize datatbase connection and sessionmaker. Creates a_hunter table """ #super(HunterPipelineA, self).__init__() engine = db_connect() create_a_hunter_table(engine) self.Session = sessionmaker(bind=engine) def process_item(self, item, spider): """ Saves items link and summary in database. Method is called for every item pipeline component """ session = self.Session() adobe = A_hunter(**item) try: session.add(adobe) session.commit() except: session.rollback() raise finally: session.close() return item #class MainHunterPipeline(HunterPipelineF, HunterPipelineL, HunterPipelineA): # def __init__(self): # super(MainHunterPipeline, self).__init__() # return item #Myhunter = MainHunterPipeline() class DuplicatesPipeline(object): def __init__(self): self.link = set() def process_item(self, item, spider): if item['link'] in self.link: raise DropItem("Duplicate item found: %s" % item) elif item['summary'] in self.link: raise DropItem("Dulicate link %s" %item) else: self.link.add(item['link']) return item #MainHunterPipeline的匹配/最接近。

Answer 1

您可以使用findInterval

df=data.frame(a)
df$b[findInterval(b, a)]=b
df
   a  b
1  1  2
2  3  4
3  5  6
4  7  8
5  9 10
6 23 24
7 35 NA
8 36 37
9 43 45

Answer 2

此算法只能处理一个NA。对于N个可能的NA，你只需要尝试所有crypto种可能性。尝试为每个可能的NA插入位置找到combination(length(b), N)。

min(abs(a-b))

查找两个向量之间的序列[间隙或差异]

2 个答案: