Question

pandas中是否有任何功能（或模拟它的方法）与

相同

cat fileName | grep -C 10 pattern > anotherFile

除了编写可以遍历整个文件的代码之外，还会在unix上做什么？

Answer 1

这是grep -C 2 "a"的pandas类似物：

In [342]: df
Out[342]:
   txt
0    q
1    f
2    i
3    t
4    y
5    n
6    r
7    o
8    f
9    g
10   m
11   s
12   f
13   o
14   v
15   e
16   s
17   a
18   n
19   h
20   w
21   q
22   a
23   i
24   a
25   s
26   l
27   e
28   l
29   f

In [343]: pattern = 'a'

In [344]: N = 2

In [345]: idx = df[df.txt.str.contains(pattern)].index

In [346]: filtered_idx = sorted(list(set([x for tup in zip(idx-N, idx+N) for x in range(tup[0], tup[1]+1)])))

In [347]: df.loc[filtered_idx]
Out[347]:
   txt
15   e
16   s
17   a
18   n
19   h
20   w
21   q
22   a
23   i
24   a
25   s
26   l

查看grep -C：

{ temp }  » grep -n -C 2 'a' aaa.txt                                                                          /d/temp
16-e
17-s
18:a
19-n
20-h
21-w
22-q
23:a
24-i
25:a
26-s
27-l

注意：grep计算从1开始的行，大熊猫从0开始 - 这就是为什么grep的行号始终是与pandas index相比one line ahead

设置：

import string
data = list(string.ascii_lowercase) * 2
df = pd.DataFrame({'txt': np.random.choice(data, 30)})

Answer 2

解决方案

import pandas as pd

def cat_grep_direct(input, pattern, output):
    df = pd.read_csv(input, header=None, delimeter='obn0x1u5')
    df = df.loc[df.loc[:, 0].str.contains(pattern, regex=True), :]
    df.head(10).to_csv(output, index=None, columns=None)

相当于大熊猫中的grep -C

2 个答案:

解决方案