Question

我正在编写一个脚本，将非常简单的函数文档转换为python中的XML。我正在使用的格式将转换：

date_time_of(date) Returns the time part of the indicated date-time value, setting the date part to 0.

为：

<item name="date_time_of">

<arg>(date)</arg>

<help> Returns the time part of the indicated date-time value, setting the date part to 0.</help>

</item>

到目前为止它工作得很好（我上面发布的XML是从程序生成的），但问题是它应该使用几行文档粘贴，但它只适用于粘贴到应用程序的第一行。我检查了Notepad ++中的粘贴文档，最后确实有CRLF，所以我的问题是什么？这是我的代码：

mainText = input("Enter your text to convert:\r\n")

try:
    for line in mainText.split('\r\n'):
        name = line.split("(")[0]
        arg = line.split("(")[1]
        arg = arg.split(")")[0]
        hlp = line.split(")",1)[1]
        print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
    print("Error!")

知道这里的问题是什么？感谢。

Answer 1

input()只读一行。

试试这个。输入一个空行以停止收集行。

lines = []
while True:
    line = input('line: ')
    if line:
        lines.append(line)
    else:
        break
print(lines)

Answer 2

从标准输入（控制台）处理读取行的最佳方法是迭代sys.stdin对象。为了做到这一点而改写，你的代码看起来像这样：

from sys import stdin
try:
  for line in stdin:
    name = line.split("(")[0]
    arg = line.split("(")[1]
    arg = arg.split(")")[0]
    hlp = line.split(")",1)[1]
    print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
    print("Error!")

那就是说，值得注意的是，在正则表达式的帮助下，您的解析代码可以大大简化。这是一个例子：

import re, sys

for line in sys.stdin:
  result = re.match(r"(.*?)\((.*?)\)(.*)", line)
  if result:
    name = result.group(1)
    arg  = result.group(2).split(",")
    hlp  = result.group(3)
    print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
  else:
    print "There was an error parsing this line: '%s'" % line

我希望这可以帮助您简化代码。

Answer 3

Patrick Moriarty，

在我看来，你并没有特别提到控制台，你主要担心的是一次将几条线路连在一起进行处理。只有一种方法可以重现你的问题：在IDLE中执行程序，手动复制文件中的几行并将它们粘贴到raw_input()

试图理解你的问题导致我得到以下事实：

从文件复制数据并粘贴到raw_input()时，换行符\r\n会转换为\n，因此raw_input()返回的字符串没有更多\r\n。因此，此字符串

split('\r\n')

在 Notepad ++ 窗口中粘贴包含隔离的\r和\n字符的数据，并激活特殊字符的显示，它会显示为 CR LF 符号在所有行的末端，即使在仅\r和\n的地方。因此，使用 Notepad ++ 来验证换行符的性质会导致错误的结论

第一个事实是你的问题的原因。我忽略了此转换影响从文件复制并传递给raw_input()的数据的先前原因，这就是我在stackoverflow上发布问题的原因：

Strange vanishing of CR in strings coming from a copy of a file's content passed to raw_input()

第二个事实是你的困惑和绝望的原因。不是偶然......

那么，如何解决您的问题呢？

这是重现此问题的代码。请注意其中的修改算法，替换应用于每一行的重复拆分。

ch = "date_time_of(date) Returns the time part.\r\n"+\
     "divmod(a, b) Returns quotient and remainder.\r\n"+\
     "enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
     "A\rB\nC"

with open('funcdoc.txt','wb') as f:
    f.write(ch)

print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)

print "open 'funcdoc.txt' to manually copy its content, and paste it on the following line"
mainText = raw_input("Enter your text to convert:\n")
print "OK, copy-paste of file 'funcdoc.txt' ' s content has been performed"


print "\nrepr(mainText)==",repr(mainText)

try:
    for line in mainText.split('\r\n'):  
        name,_,arghelp  = line.partition("(")
        arg,_,hlp = arghelp.partition(") ")
        print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
    print("Error!")

以下是delnan提到的解决方案：«从源代码中读取而不是拥有人工副本并粘贴它。 » 它适用于您的split('\r\n')：

ch = "date_time_of(date) Returns the time part.\r\n"+\
     "divmod(a, b) Returns quotient and remainder.\r\n"+\
     "enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
     "A\rB\nC"

with open('funcdoc.txt','wb') as f:
    f.write(ch)

print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)

#####################################

with open('funcdoc.txt','rb') as f:
    mainText = f.read()

print "\nfile 'funcdoc.txt' has just been opened and its content copied and put to mainText"

print "\nrepr(mainText)==",repr(mainText)
print

try:
    for line in mainText.split('\r\n'):  
        name,_,arghelp  = line.partition("(")
        arg,_,hlp = arghelp.partition(") ")
        print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
    print("Error!")

最后，这是Python处理改变后的人工副本的解决方案：提供处理所有类型换行符的splitlines()函数（\r或\n或\r\n ）作为分裂者。所以替换

for line in mainText.split('\r\n'):

通过

for line in mainText.splitlines():

Python没有正确拆分CRLF

3 个答案: