在Python中加载csv并删除换行符

时间:2018-01-29 13:29:05

标签: python pandas csv numpy

我有一个csv文件(包含+1000行,\t用作分隔符),我想将其作为列表加载到Python中。以下是该文件的前几行:

"col1"  "col2"  "col3"  "col4"  "col5"  "col6"
1   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str1"  "str3"  "str4 åå here comes a few newline characters







"
2   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str2"  "str3"  "str5 åasg here comes more newlines

"

如您所见,字符串往往包含许多换行符。有没有办法去除所有换行符的字符串,然后创建一个包含所有行的列表?

我的尝试:基于此thread,这是我的尝试:

import csv
with open('test.dat') as csvDataFile:
    csvReader = csv.reader(csvDataFile, delimiter="\t")
    for i in csvReader:
        print(list(map(str.strip,i)))

然而,这并没有剥离任何东西。

2 个答案:

答案 0 :(得分:0)

从列表中删除换行符(" \ n")的示例代码段

a = ['\n', "a", "b", "c", "\n"]
def remNL(l):
    return [i for i in l if i != "\n"]    

print filter(remNL, a)

在你的情况下

print(filter(remNL,i))

答案 1 :(得分:0)

您可以使用正则表达式查找所有重复的\n个字符,然后将其从输入文本中删除。

import re  # The module for regular expressions

input = """ The text from the csv file """

# Find all the repeated \n chars in input and replace them with ""
# Take the first element as the function returns a tuple with the 
# new string and the number of subs made
stripedInput = re.subn(r"\n{2,}", "", input)[0]

我们现在拥有csv文件文本,没有任何重复的\n个字符。然后可以通过

获得行
rows = stripedInput.split("\n")

如果您希望拆分成列,则可以执行

for i in range(len(rows)):
  rows[i] = rows[i].split("\t")