如何根据显式换行符('\ n')拆分文本字符串?
不幸的是,我没有使用格式正确的csv文件,而是使用“\ n”处理一长串文本,其中换行符将是。 (示例格式:“A0,B0 \ nA1,B1 \ nA2,B2 \ nA3,B3 \ n ......”)我认为一个简单的bad_csv_list = text.split('\n')
会给我一个二值单元格的列表(示例拆分) ['A0,B0','A1,B1','A2,B2','A3,B3',......])。相反,我最终得到一个单元格,“\ n”转换为“\\ n”。我尝试复制粘贴字符串的一部分并使用split('\ n'),它按照我的希望工作。文件对象的print语句告诉我以下内容:
<_io.TextIOWrapper name='stats.csv' mode='r' encoding='cp1252'>
...所以我怀疑问题出在cp1252编码上?值得注意的是:Notepad ++说我正在使用的文件是“没有BOM的UTF-8”......我查看了文档和SO,并尝试导入io和编解码器,并在open
语句前加上声明并声明encoding='utf8'
但我感到茫然,我并不真正想要文本编码。也许有更好的解决方案?
from sys import argv
# import io, codec
filename = argv[1]
file_object = open(filename, 'r')
# file_object = io.open(filename, 'r', encoding='utf8')
# file_object = codec.open(filename, 'r', encoding='utf8')
file_contents = file_object.read()
file_list = file_contents.split('\n')
print("1.) Here's the name of the file: {}".format(filename))
print("2.) Here's the file object info: {}".format(file_object))
print("3.) Here's all the files contents:\n{}".format(file_contents))
print("4.) Here's a list of the file contents:\n{}".format(file_list))
非常感谢任何帮助,谢谢。
如果它有助于解释我正在处理的内容,这里是stats.csv文件的内容:
Albuquerque,749\nAnaheim,371\nAnchorage,828\nArlington,503\nAtlanta,1379\nAurora,425\nAustin,408\nBakersfield,542\nBaltimore,1405\nBoston,835\nBuffalo,1288\nCharlotte-Mecklenburg,647\nCincinnati,974\nCleveland,1383\nColorado Springs,455\nCorpus Christi,658\nDallas,675\nDenver,615\nDetroit,2122\nEl Paso,423\nFort Wayne,362\nFort Worth,587\nFresno,543\nGreensboro,563\nHenderson,168\nHouston,992\nIndianapolis,1185\nJacksonville,617\nJersey City,734\nKansas City,1263\nLas Vegas,784\nLexington,352\nLincoln,397\nLong Beach,575\nLos Angeles,481\nLouisville Metro,598\nMemphis,1750\nMesa,399\nMiami,1172\nMilwaukee,1294\nMinneapolis,992\nMobile,522\nNashville,1216\nNew Orleans,815\nNew York,639\nNewark,1154\nOakland,1993\nOklahoma City,919\nOmaha,594\nPhiladelphia,1160\nPhoenix,636\nPittsburgh,752\nPlano,130\nPortland,517\nRaleigh,423\nRiverside,443\nSacramento,738\nSan Antonio,503\nSan Diego,413\nSan Francisco,704\nSan Jose,363\nSanta Ana,401\nSeattle,597\nSt. Louis,1776\nSt. Paul,722\nStockton,1548\nTampa,616\nToledo,1171\nTucson,724\nTulsa,990\nVirginia Beach,169\nWashington,1177\nWichita,742
分裂的结果('\ n'):
['Albuquerque,749\\nAnaheim,371\\nAnchorage,828\\nArlington,503\\nAtlanta,1379\\nAurora,425\\nAustin,408\\nBakersfield,542\\nBaltimore,1405\\nBoston,835\\nBuffalo,1288\\nCharlotte-Mecklenburg,647\\nCincinnati,974\\nCleveland,1383\\nColorado Springs,455\\nCorpus Christi,658\\nDallas,675\\nDenver,615\\nDetroit,2122\\nEl Paso,423\\nFort Wayne,362\\nFort Worth,587\\nFresno,543\\nGreensboro,563\\nHenderson,168\\nHouston,992\\nIndianapolis,1185\\nJacksonville,617\\nJersey City,734\\nKansas City,1263\\nLas Vegas,784\\nLexington,352\\nLincoln,397\\nLong Beach,575\\nLos Angeles,481\\nLouisville Metro,598\\nMemphis,1750\\nMesa,399\\nMiami,1172\\nMilwaukee,1294\\nMinneapolis,992\\nMobile,522\\nNashville,1216\\nNew Orleans,815\\nNew York,639\\nNewark,1154\\nOakland,1993\\nOklahoma City,919\\nOmaha,594\\nPhiladelphia,1160\\nPhoenix,636\\nPittsburgh,752\\nPlano,130\\nPortland,517\\nRaleigh,423\\nRiverside,443\\nSacramento,738\\nSan Antonio,503\\nSan Diego,413\\nSan Francisco,704\\nSan Jose,363\\nSanta Ana,401\\nSeattle,597\\nSt. Louis,1776\\nSt. Paul,722\\nStockton,1548\\nTampa,616\\nToledo,1171\\nTucson,724\\nTulsa,990\\nVirginia Beach,169\\nWashington,1177\\nWichita,742']
为什么要添加\?
答案 0 :(得分:4)
DOH !!! ROYAL FACE PALM!我刚写完所有这些然后意识到我需要做的就是在\ newline之前放一个转义斜杠:
file_list = file_contents.split('\\n')
我会发布这个,所以你们都可以轻笑^ _ ^