Question

我有一个csv文件（包含+1000行，\t用作分隔符），我想将其作为列表加载到Python中。以下是该文件的前几行：

"col1"  "col2"  "col3"  "col4"  "col5"  "col6"
1   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str1"  "str3"  "str4 åå here comes a few newline characters







"
2   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str2"  "str3"  "str5 åasg here comes more newlines

"

如您所见，字符串往往包含许多换行符。有没有办法去除所有换行符的字符串，然后创建一个包含所有行的列表？

我的尝试：基于此thread，这是我的尝试：

import csv
with open('test.dat') as csvDataFile:
    csvReader = csv.reader(csvDataFile, delimiter="\t")
    for i in csvReader:
        print(list(map(str.strip,i)))

然而，这并没有剥离任何东西。

Answer 1

从列表中删除换行符（＆＃34; \ n＆＃34;）的示例代码段

a = ['\n', "a", "b", "c", "\n"]
def remNL(l):
    return [i for i in l if i != "\n"]    

print filter(remNL, a)

在你的情况下

print(filter(remNL,i))

Answer 2

您可以使用正则表达式查找所有重复的\n个字符，然后将其从输入文本中删除。

import re  # The module for regular expressions

input = """ The text from the csv file """

# Find all the repeated \n chars in input and replace them with ""
# Take the first element as the function returns a tuple with the 
# new string and the number of subs made
stripedInput = re.subn(r"\n{2,}", "", input)[0]

我们现在拥有csv文件文本，没有任何重复的\n个字符。然后可以通过

获得行

rows = stripedInput.split("\n")

如果您希望拆分成列，则可以执行

for i in range(len(rows)):
  rows[i] = rows[i].split("\t")

在Python中加载csv并删除换行符

2 个答案: