Question

有没有办法从子进程获取输出并将其转换为可迭代的csv.reader或csv.DictReader对象？这是我一直在尝试的代码：

p2 = subprocess.Popen("sort command...", stdout=subprocess.PIPE)
output = p2.communicate()[0]
edits = csv.reader(output, delimiter="\t")

基本上，我正在对一个大型CSV文件进行排序，然后我想将它作为csv.reader对象加入Python。

我得到的错误是

错误：迭代器应该返回字符串，而不是int（你是否以文本模式打开文件？）

有没有办法将这个字节流视为csv.reader对象，还是我想错误的方法？

Answer 1

这是Python 3中的一个问题.CSV模块需要unicode输入，而不是字节字符串。除此之外，public void captureImage(View v) { Intent camera_intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE); startActivityForResult(camera_intent, CAMERA_PIC_REQUEST); } @Override protected void onActivityResult(int requestCode, int resultCode, Intent data) { imgView = (ImageView) findViewById(R.id.formRegister_picture); imgView.setScaleType(ImageView.ScaleType.CENTER_CROP); super.onActivityResult(requestCode, resultCode, data); switch(requestCode){ case CAMERA_PIC_REQUEST: if(resultCode==RESULT_OK){ Bitmap thumbnail = (Bitmap) data.getExtras().get("data"); imgView.setImageBitmap(thumbnail); } } }还需要一个可迭代的文件，例如打开的文件或字符串列表。试试这个：

csv.reader()

如果encoding = 'ascii' # specify the encoding of the CSV data p2 = subprocess.Popen(['sort', '/tmp/data.csv'], stdout=subprocess.PIPE) output = p2.communicate()[0].decode(encoding) edits = csv.reader(output.splitlines(), delimiter=",") for row in edits: print(row)包含（我已将逗号用作分隔符）：

1,2,3,4
9,10,11,12
a,b,c,d
5,6,7,8

然后输出将是：

['1', '2', '3', '4']
['5', '6', '7', '8']
['9', '10', '11', '12']
['a', 'b', 'c', 'd']

Answer 2

以下内容适用于我（即使文档警告有关从stdout阅读）。使用stdout包裹io.TextIOWrapper()支持字段数据中嵌入的换行符。

这样做可以使用一个生成器，它的优点是可以逐步读取stdout，一次读取一行。

p2 = subprocess.Popen(["sort", "tabbed.csv"], stdout=subprocess.PIPE)
output = io.TextIOWrapper(p2.stdout, newline=os.linesep)
edits = csv.reader((line for line in output), delimiter="\t")
for row in edits:
    print(row)

输出：

['1', '2', '3', '4']
['5', '6', '7', '8']
['9', '10', '11', '12']
['a', 'b\r\nx', 'c', 'd']

tabbed.csv输入测试文件包含此内容（其中»表示制表符，≡表示换行符：

1»2»3»4
9»10»11»12
a»"b≡x"»c»d
5»6»7»8

Answer 3

要启用文字模式，请传递universal_newlines=True参数：

#!/usr/bin/env python3
import csv
from subprocess import Popen, PIPE

with Popen(["sort", "a.csv"], stdout=PIPE, universal_newlines=True) as p:
    print(list(csv.reader(p.stdout, delimiter="\t")))

如果您需要解释嵌入在引用字段中的换行符，请创建io.TextIOWrapper，以传递newline=''参数：

#!/usr/bin/env python3
import csv
import io
from subprocess import Popen, PIPE

with Popen(["sort", "a.csv"], stdout=PIPE) as p, \
     io.TextIOWrapper(p.stdout, newline='') as text_file:
    print(list(csv.reader(text_file, delimiter="\t")))

此外，TextIOWrapper允许显式指定字符编码（否则使用默认的locale.getpreferredencoding(False)）。

注意：您不需要外部sort命令。你可以用纯Python对行进行排序：

#!/usr/bin/env python3
import csv

with open('a.csv', newline='') as text_file:
    rows = list(csv.reader(text_file, delimiter="\t"))
    rows.sort()
    print(rows)

注意：更高版本对csv行而不是物理行进行排序（如果需要，可以对行进行排序）。

将输出从子进程转换为csv.reader对象

3 个答案: