删除“;”

Question

我有一个csv文件，如下所示：

ll size = 2;
ll *arr = new ll[size];
memset(arr,0,sizeof(arr));

文件被5005284;5003485;C1; C2;A00.00;10-11-01;NULL;1;; 2006483;2003855;this is some text; and some 787; or even &[]\><;A87.03;30-09-86;NULL;1; 2006485;2003855;C;K86.00;31-12-91;NULL;1;;;分开，不幸的是，他们在第3列中使用了这个字符来创建额外的列。我想将所有这些 false 列连接到一个列中，如下面的预期输出所示。

到目前为止，我有：

导致：

import re 
import pandas as pd

text = open ('testepisodes.csv')
cleared = pd.DataFrame()

for line in text:
# get rid of extra ;;; or ;;
    line.replace(";;;", ";")
    line.replace(";;", ";")
    print line
    index = line.count(";")
    print index
    if index==9:
        line = re.sub(r'^((?:[^.]*\;){4}[^.]*)\..*', r'\1', line)
    if index==8:
        line = re.sub(r'^((?:[^.]*\;){3}[^.]*)\..*', r'\1', line)
print line

我想要的地方：

2078915;2003855;this is some text; and some 787; or even &[]\><;A87.03;30-09-86;NULL;1;
126
126
2078915;2003855;this is some text; and some 787; or even &[]\><;A87.03;30-09-86;NULL;1;

从comment

编辑

索引2始终是应该在一起的起点。新索引3应包含＆＃39; A00.00＆＃39;模式在哪里＆＃39; A＆＃39;表示任何大写字母（A-Z），并且每个大写字母都是0＆＃39;表示一个数字（0-9）。

Answer 1

试试这段代码：

5005284;5003485;C1; C2;A00.00;10-11-01;NULL;1;;
2006483;2003855;this is some text; and some 787; or even &[]\><;A87.03;30-09-86;NULL;1;
2006485;2003855;C;K86.00;31-12-91;NULL;1;;;

使用这样的输入文件：

5005284;5003485;C1 C2;A00.00;10-11-01;NULL;1
2006483;2003855;this is some text and some 787 or even &[]\><;A87.03;30-09-86;NULL;1
2006485;2003855;C;K86.00;31-12-91;NULL;1

创建的输出文件如下所示：

请注意，每行末尾没有writer.writerow([e for e in newrow if e] + [''])，这是csv文件中的常见情况。但是，如果需要，请在写入新文件时在每行的末尾添加一个空列。也许是这样的：

$(document).click(function(event){
    var value = $(event.target).attr('id');
    alert(value);
});

Answer 2

删除“;”

line.replace()不会更改原始行，它会返回包含所请求更改的新行。请参阅文档here因此，此代码不符合您的想法：

line.replace(";;;", ";")
line.replace(";;", ";")

示例：

a
Out[20]: ';fsdfds;dsfss;f;sdfsdf;sdf'

a.replace("s", "S")
Out[21]: ';fSdfdS;dSfSS;f;SdfSdf;Sdf'

a
Out[22]: ';fsdfds;dsfss;f;sdfsdf;sdf'

尝试这样的事情：

while ";;" in line:
    line = line.replace(";;", ";")

这将删除“;”的任何重复字符。

写入.csv

尝试类似

的内容

with open("new_document.csv") as new:
   new.write(modified_lines)

改善结构：

更好的方法是使用生成器过滤和修复csv中的行，然后迭代它以写入新文件。例如：

def fix_wonky_csv(wonky_csv):
    for line in wonky_csv:
        # fix the lines
        yield line

def create_new_file:
    newfile = open(new_title, "w")
    with open(filename) as f:
        for line in fix_wonky_csv(f):
            newfile.write(line)
    newfile.close()

根据条件合并csv文件的某些列

2 个答案:

删除“;”

写入.csv

改善结构：