Question

pandas read_csv 功能似乎只允许使用单字符分隔符/分隔符。有没有办法允许使用一串字符，如＆＃34; * | *＆＃34;或＆＃34; %%＆＃34;代替？

Answer 1

public ResponseEntity<?> getObject(@PathVariable("shopId") String shopId,
            @PathVariable("delearId") String delearId) {
        Shop objectToSave = (shopId.equalsIgnoreCase("0")) ? (null) : shopService.findOne(shopId);
        Delear objectName = (delearId.equalsIgnoreCase("0")) ? null : delearService.findOne(delearId);
        ResponseEntity<?> responseEntity = new ResponseEntity<>(objectName && objectToSave , HttpStatus.OK);// i want to combine both delear and shop

        if (objectName == null && objectToSave == null) {
        responseEntity = new ResponseEntity<>(objectName,objectToSave , HttpStatus.NOT_FOUND);
    }
    return responseEntity;
    }

Answer 2

解决方案是使用read_table而不是read_csv：

1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25

所以，我们可以用以下内容来阅读：

pd.read_table('file.csv', header=None, sep='\*\|\*')

Answer 3

正如Padraic Cunningham在上面的评论中写道，目前还不清楚你为什么要这样做。关于分隔符的Wiki entry for the CSV Spec州：

...由分隔符分隔（通常是单个保留字符，如逗号，分号或制表符;有时分隔符可能包含可选空格），

毫无疑问，csv module和大熊猫都不支持您的要求。

但是，如果你真的想这样做，那么你几乎要使用Python的字符串操作。以下示例显示如何将数据框转换为“csv”，其中$$分隔行，并%%分隔列。

'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())

当然，在将其写入文件之前，您不必将其变成这样的字符串。

Answer 4

不是pythonic方式，而是绝对的编程方式，您可以使用以下方式：

import re

def row_reader(row,fd):
    arr=[]
    in_arr = str.split(fd)
    i = 0
    while i < len(in_arr):
        if re.match('^".*',in_arr[i]) and not re.match('.*"$',in_arr[i]):
            flag = True
            buf=''
            while flag and i < len(in_arr):
                buf += in_arr[i]
                if re.match('.*"$',in_arr[i]):
                    flag = False
                i+=1
                buf += fd if flag else ''
            arr.append(buf)
        else:
            arr.append(in_arr[i])
            i+=1
    return arr

with open(file_name,'r') as infile:
    for row in infile:
        for field in  row_reader(row,'%%'):
            print(field)

Answer 5

在 pandas 1.1.4 中，当我尝试使用多字符分隔符时，我收到消息：

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

因此，为了能够使用多个字符分隔符，现代解决方案似乎是在 engine='python' 参数中添加 read_csv（在我的情况下，我将它与sep='[ ]?;)

在Python Pandas read_csv中使用多个字符分隔符

5 个答案: