Question

df = pd.DataFrame({'A':['A','B','C','D'],
                   'B':[4,5,6,7]})

A B
A 4
B 5
C 6
D 7

我想返回一种方法来返回从给定字符串开始的所有行，在A列中说“B”。

A B
B 5
C 6
D 7

去Deacs！

Answer 1

如果字符串始终存在，则可以将o.a.s.ml.linalg.VectorUDT与条件Series一起使用以查找字符串首次出现的索引，然后使用idxmax()方法在索引后提取行：< / p>

tail()

另一个可能更安全的方法，即使该列中不存在该字符串仍然有效：

df.tail(-(df.A == "B").idxmax())   # this method works if the string exists in the column
# and the index of the data frame is a normal sequence as given by range(n)

#   A   B
#1  B   5
#2  C   6
#3  D   7

Answer 2

假设A列中的数据按字母顺序排序，您可以使用子集，类似

df[df['A'] >= 'B']

会做到这一点。

Answer 3

如果列A未按字母顺序排序，则可以使用此解决方案。

此外，如果列B包含多个值A，这将从列A中第一次出现B的行开始数据框。

idx = df[df['A'] == 'B'].index[0]
df = df[idx:]
print(df)
   A  B
1  B  5
2  C  6
3  D  7

Answer 4

一个很好地概括的答案可以使用numpy.argwhere

idx = np.argwhere(df.A == 'B')[0][0]
df.iloc[idx:]

如何从包含给定字符串的行开始切割两列pandas数据帧？

4 个答案: