Question

我无法理解如何使用subprocess解决我的问题。

假设我的子文件中有一个制表符分隔的文本文件tabdelimited1.txt，我想将其读入pandas数据帧。

当然，您可以按如下方式导入数据：

import pandas as pd
df = pd.read_csv("tabdelimited1.txt", header=None, sep="\s+")

但是，假设我们想要使用subprocess。在命令行中，$cat tabdelimited1.txt将输出所有行。

现在，我想使用子进程来读取cat tabdelimited1.txt的输出。怎么做到这一点？

我们可以使用

import subprocess
task = subprocess.Popen("cat file.txt", shell=True,  stdout=subprocess.PIPE)
data = task.stdout.read()

但是（1）我得到shell=True的错误和（2）我想逐行读取数据。

如何使用subprocess逐行阅读tabdelimited1.txt？该脚本应如下所示：

import subprocess
import pandas as pd

df = pd.DataFrame()
task = subprocess.Popen("cat file.txt", shell=True,  stdout=subprocess.PIPE)
# while lines exist:
    # line = subprocess std
    df=pd.concat([df, line])

EDITED

Answer 1

您可以通过将命令分解为列表来完全跳过shell。然后它只是迭代进程stdout的问题：

import subprocess
import pandas as pd

df = pd.DataFrame()
task = subprocess.Popen(["cat", "file.txt"], stdout=subprocess.PIPE)
for line in task.stdout:
    df=pd.concat([df, line])
task.wait()

Answer 2

import sys
for line in sys.stdin:
    print(line.split())

可以与shell命令一起使用，如：

0025:~/mypy$ cat x.txt | python3 stack39864304.py
['1', '3', 'test1;']
['2', '2', 'test2;']
['3', '2', 'test3;']

否则，在交互式会话中，我可以这样做：

In [269]: task = subprocess.Popen("cat x.txt", shell=True,  stdout=subprocess.PIPE)
In [270]: for line in task1.stdout:print(line.split())
[b'1', b'3', b'test1;']
[b'2', b'2', b'test2;']
[b'3', b'2', b'test3;']

（py3 bytestrings）

python3 stack39864304.py < x.txt是将此文件发送到脚本的另一种方式。

cat afile | ...可能过于简单，并提出了为什么不直接阅读的所有异议。但cat可以替换为head，tail甚至ls -l | python3 stack39864304.py，以获取包含此split的目录列表。

我使用ipython进行大多数交互式python编码;它的许多%magic使用子进程;我在此会话中始终使用cat x.txt，ls。

如何使用子进程和'cat'逐行读取数据？

2 个答案: