"反转" a启动pandas中的DataFrame选择

时间:2017-08-05 00:42:37

标签: python pandas

Pandas允许use std::cmp::{min, max}; #[derive(Debug, PartialEq, Clone, Copy)] struct Interval { start: usize, stop: usize, } impl Interval { fn new(start: usize, stop: usize) -> Interval { Interval { start: start, stop: stop, } } pub fn starts_before_disjoint(&self, other: &Interval) -> bool { self.start < other.start && self.stop < other.start } pub fn starts_before_non_disjoint(&self, other: &Interval) -> bool { self.start <= other.start && self.stop >= other.start } pub fn starts_after(&self, other: &Interval) -> bool { self.start > other.start } pub fn starts_after_disjoint(&self, other: &Interval) -> bool { self.start > other.stop } pub fn starts_after_nondisjoint(&self, other: &Interval) -> bool { self.start > other.start && self.start <= other.stop } pub fn disjoint(&self, other: &Interval) -> bool { self.starts_before_disjoint(other) } pub fn adjacent(&self, other: &Interval) -> bool { self.start == other.stop + 1 || self.stop == other.start - 1 } pub fn union(&self, other: &Interval) -> Interval { Interval::new(min(self.start, other.start), max(self.stop, other.stop)) } pub fn intersection(&self, other: &Interval) -> Interval { Interval::new(max(self.start, other.start), min(self.stop, other.stop)) } } fn main() { //making vectors let mut vec = vec![ Interval::new(1, 1), Interval::new(2, 3), Interval::new(6, 7), ]; let addition = Interval::new(2, 5); // <- this will take over interval @ 2 and will be adjacent to 3, so we have to merge let (mut i, len) = (0, vec.len()); while i < len { let r = &mut vec[i]; if *r == addition { return; //nothing to do, just a duplicate } if addition.adjacent(r) || !addition.disjoint(r) { //if they are next to each other or overlapping //lets merge let mut bigger = addition.union(r); *r = bigger; //now lets check what else we can merge while i < len - 1 { i += 1; let next = &vec[i + 1]; if !bigger.adjacent(next) && bigger.disjoint(next) { //nothing to merge break; } vec.remove(i); //<- FAIL another mutable borrow i -= 1; //lets go back vec[i] = bigger.union(next); //<- FAIL and yet another borrow } return; } if addition.starts_before_disjoint(r) { vec.insert(i - 1, addition); // <- FAIL since another refence already borrowed @ let r = &mut vec[i] } i += 1; } } 选择startswith,例如:

DataFrame

......可能会产生类似

的内容
query = "Ali"
people[people.Name.str.startswith(query)]

但是,我想反转选择中的输入,以查找Name Ali Alice Alicia Alistair ... input中的值开头的行。

类似的东西:

DataFrame

例如,这可能会选择名称query = "Ali" people[query.startswith(people.Name)] Al

这段代码显然不起作用,这对于这个例子来说似乎并不合理,但这是我想要用我的数据实现的。

任何人都知道如何实现这一目标?

1 个答案:

答案 0 :(得分:4)

制备

import pandas as pd

people = pd.DataFrame()
people["FullName"] = ["Alice Cooper", "Ali","Aloy"]

input_ = "Alice"

备选方案1:Boolean-indexing

people[[input_.startswith(i) for i in people.FullName]] # passes [False, True, False]
  

1000个循环,最佳3:937μs/循环

备选方案2 (感谢@Abdou):使用.apply() and lambda

调用系列值的函数
people[people.FullName.apply(lambda s: input_.startswith(s))]
  

1000次循环,最佳3:每循环1.38 ms

输出:

FullName
1   Ali