Question

作为python的新手，我需要以下方面的帮助：

我正在编写一个脚本来计算“ |”的出现次数在csv文件中。因此，我通常会获得带有文本限定符和管道定界符的大型csv文件。有时会发生几行转移到新行的情况。例如：

"a"|"b"|"c"|"D"|"E"
"F"|"G"|"R"|
"T"|"I"
"W"|"Y"|"U"|"IA|SD"|"O"

在上面的示例中，第二行移至新行。所以我的计划是编写一个编以计算“ |”的次数发生在一行中，如果计数不匹配，则显示该行并将其复制到另一个文件。请注意，因为这是一个文本限定符文件，所以我需要考虑管道以及双引号；我本来可以算出管道的数量，但是上面示例中的第三行也可以算在内。脚本是：

import string

l='"|"'
k = 0
linecount=0

with open('testfile.txt') as myfile:
    for line in myfile:
        k=0
        linecount=linecount+1
        words = line.split()
        for i in words:
            for letter in i:
                if(letter==l):
                    k=k+1
        print("Occurrences of the letter:",k)
        print(k)
        if(k!=4):
            print(line)
            f = open("Lines_FILE.txt","a")
            f.write(line)
f.close()

如果您注意到k是计数，但是我的输出是：

Occurrences of the letter: 0
0
"a"|"b"|"c"|"D"|"E"

Occurrences of the letter: 0
0
"F"|"G"|"R"|

Occurrences of the letter: 0
0
"T"|"I"

Occurrences of the letter: 0
0
"W"|"Y"|"U"|"IA|SD"|"O"

因此您可以看到“ |”计数不正确。我尝试了'“'+ | +'”'，但是没有用。因此，如果我能够存储值“ |”在l变量中，我认为我可以完成工作。有什么建议吗？

如果有人可以指出存储“ |”的方式整体上讲，这也会有很大帮助。我不想对文件进行适当的修复。请注意，上面的示例是3行，由于发生了新行，该行已转换为4行。

Answer 1

要计算字符串中子字符串的出现次数，无需手动循环“单词”或任何其他项。此外，split函数（不带任何参数）仅在空白处分割，因此该行没有任何用处。

行

for letter in i:

也不起作用，因为i是一个字符串，因此此行将每个单独的字符馈送到letter中。一个字符不能等于您的多字符字符串l，这就是您的k永不增加的原因。

使用内置的字符串函数count：

str.count(sub[, start[, end]])
  返回范围为sub的子字符串[start, end]不重叠的次数。可选参数start和end被解释为切片符号。
  （https://docs.python.org/3.7/library/stdtypes.html#str.count）

l='"|"'
k = 0
linecount=0

with open('testfile.txt') as myfile:
    for line in myfile:
        linecount=linecount+1
        k = line.count(l)
        print("Occurrences of the letter:",k)
        print(k)
        if(k!=4):
            print(line)
            f = open("Lines_FILE.txt","a")
            f.write(line)
f.close()

现在您将获得预期的输出

Occurrences of the letter: 4
4
Occurrences of the letter: 2
2
"F"|"G"|"R"|

Occurrences of the letter: 1
1
"T"|"I"

Occurrences of the letter: 4
4

（原始答案，在澄清后已过时）

split()仅在空格上分割，因此您不会得到“单词”。（此外，拆分其他内容也没有用，因为它会丢弃拆分后的字符串。）

您可以立即计算出某个字符串（line.count('"|"')）中一个子字符串出现了多少次，但是您的分隔行以"|结尾，因此您只需测试一下即可：

with open('testfile.txt') as myfile, open("Lines_FILE.txt","w") as outfile:
    while True:
        currline = myfile.readline().strip()
        if not currline:
            break
        if currline.endswith('"|'):
            currline += myfile.readline().strip()
        print currline
        outfile.write (currline)

结果：

"a"|"b"|"c"|"D"|"E"
"F"|"G"|"R"|"T"|"I"
"W"|"Y"|"U"|"IA|SD"|"O"

Answer 2

您可以直接使用csv模块：

from io import StringIO
from csv import reader, writer

txt = '''"a"|"b"|"c"|"D"|"E"
"F"|"G"|"R"|
"T"|"I"
"W"|"Y"|"U"|"IA|SD"|"O"'''

with StringIO(txt) as infile, StringIO() as outfile:
    maxlen = None
    rows = reader(infile, delimiter='|', quotechar='"')
    out_csv = writer(outfile, delimiter='|', quotechar='"')
    for row in rows:
        if maxlen is None:
            maxlen = len(row)
        while len(row) < maxlen:
            row.extend(next(rows))
        # remove empty item
        row = [item for item in row if item != '']
        out_csv.writerow(row)

    print(outfile.getvalue())

打印：

a|b|c|D|E
F|G|R|T|I
W|Y|U|"IA|SD"|O

这假设输入文件中的第一行具有正确的长度。

您应该用实际的输入和输出文件替换StringIO部分。

存储值“ |”使用python在字符串中用双引号引起来

2 个答案: