考虑我有两个向量
a <- c(1,3,5,7,9, 23,35,36,43)
b <- c(2,4,6,8,10,24, 37, 45)
请注意两者的length
不同。
我想根据最接近的距离找到两个矢量之间的差距/差异/序列。
预期输出
a b
1 2
3 4
5 6
7 8
9 10
23 24
35 NA
36 37
43 45
请注意,35
对NA
有36
因为37
的序列与I am using Scrapy to scrape the items. Here is my pipelines code
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
from sqlalchemy.orm import sessionmaker
from . models import F_hunter,db_connect, create_f_hunter_table, \
L_hunter, create_l_hunter_table, \
A_hunter, create_a_hunter_table
from scrapy.spiders import Spider
from scrapy.exceptions import DropItem
###########################################################
class HunterPipelineF(object):
""" Hunter pipeline for storing scraped items to database"""
def __init__(self):
"""Initalize datatbase connection and sessionmaker.
Creates f_hunter table
"""
engine = db_connect()
create_f_hunter_table(engine)
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
""" Saves items link and summary in database.
Method is called for every item pipeline component
"""
session = self.Session()
f = F_hunter(**item)
try:
session.add(f)
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return item
##############################################################
class HunterPipelineL(object):
""" Hunter pipeline for storing scraped items to database"""
def __init__(self):
"""Initalize datatbase connection and sessionmaker.
Creates f_hunter table
"""
#super(HunterPipelineL, self).__init__()
engine = db_connect()
create_l_hunter_table(engine)
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
""" Saves items link and summary in database.
Method is called for every item pipeline component
"""
session = self.Session()
l = L_hunter(**item)
try:
session.add(l)
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return item
#############################################################
class HunterPipelineA(object):
""" Hunter pipeline for storing scraped items to database"""
def __init__(self):
"""Initalize datatbase connection and sessionmaker.
Creates a_hunter table
"""
#super(HunterPipelineA, self).__init__()
engine = db_connect()
create_a_hunter_table(engine)
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
""" Saves items link and summary in database.
Method is called for every item pipeline component
"""
session = self.Session()
adobe = A_hunter(**item)
try:
session.add(adobe)
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return item
#class MainHunterPipeline(HunterPipelineF, HunterPipelineL, HunterPipelineA):
# def __init__(self):
# super(MainHunterPipeline, self).__init__()
# return item
#Myhunter = MainHunterPipeline()
class DuplicatesPipeline(object):
def __init__(self):
self.link = set()
def process_item(self, item, spider):
if item['link'] in self.link:
raise DropItem("Duplicate item found: %s" % item)
elif item['summary'] in self.link:
raise DropItem("Dulicate link %s" %item)
else:
self.link.add(item['link'])
return item
#MainHunterPipeline
的匹配/最接近。
答案 0 :(得分:5)
您可以使用findInterval
df=data.frame(a)
df$b[findInterval(b, a)]=b
df
a b
1 1 2
2 3 4
3 5 6
4 7 8
5 9 10
6 23 24
7 35 NA
8 36 37
9 43 45
答案 1 :(得分:1)
此算法只能处理一个NA。对于N个可能的NA,你只需要尝试所有crypto
种可能性。尝试为每个可能的NA插入位置找到combination(length(b), N)
。
min(abs(a-b))