Question

我有一个.csv文件，看起来像：

['NAME' " 'RA_I1'" " 'DEC_I1'" " 'Mean_I1'" " 'Median_I1'" " 'Mode_I1'" ...]"

这个字符串继续（我认为）95个条目，整个文件超过一千行深。我想删除所有字符：[ ' "并将所有内容分隔为一个空格条目（' '）。
到目前为止，我已经尝试过：

import pandas as pd

df1 = pd.read_table('slap.txt')
    for char in df1:
        if char in " '[":
            df1.replace(char, '')

print df1

我只是在“测试”代码，看看它是否会按我的意愿行事，但事实并非如此。我想在整个文件上实现它，但我不确定如何。

我已经检查过this old post但是并没有完全按照我的目的使用它。我也玩了链接的post，唯一的问题似乎是所有的条目都是两次而不是一次....

Answer 1

这看起来像你应该能够在read_csv的sep参数中使用（不是特别漂亮的）正则表达式来抓取：

In [11]: pd.read_csv(file_name, sep='\[\'|\'\"\]|[ \'\"]*', header=None)
Out[11]:
    0     1      2       3        4          5        6   7
0 NaN  NAME  RA_I1  DEC_I1  Mean_I1  Median_I1  Mode_I1 NaN

您可以使用正则表达式，直到它真正符合您的需求。

解释这个：

sep = ('\[\'  # each line startswith ['  (the | means or)
       '|\'\"\]'  # endswith '"] (at least the one I had)
       '|[ \'\"]+')  # this is the actual delimiter, the + means at least one, so it's a string of ", ' and space in any order.

你可以看到这个hack在任何一端留下了一个NaN列。这个非常糟糕的主要原因是因为你的“csv”不一致，我肯定会建议清理它，当然，一种方法就是使用pandas然后使用to_csv。如果它是由其他人生成的......抱怨（！）。

Answer 2

你试过了吗？

string.strip(s[, chars])

http://docs.python.org/2/library/string.html

从（.csv或.txt）文件Python中删除各种字符

2 个答案: