Python修改CSV数据

时间:2017-06-05 03:03:57

标签: python csv

我有一个看起来像这样的文件(即连续2/3行的随机组合):

String A
String B
String C
<Blank Row>
String D
String E
<Blank Row>
String F
String G
String H
<Blank Row>
String I
String J
String K
<Blank Row>
String L
String M

我希望输出文件在有3个连续行时移除中间行并转置剩余的2行。如果只有2行,它们应该被转置。最终结果应如下所示。

String A,String C
String D,String E
String F,String H
String I,String K
String L,String M

有关如何完成此任务的任何指示?

2 个答案:

答案 0 :(得分:1)

您可以使用groupby模块中的countitertools以及list comprehension

这个答案有点hacky,但是要做到这一点。请参阅注释以更好地理解背后的逻辑。

我假设您的输入是您在名为my_input_file的文件中提供的输入,而您的输出文件名为output_file

from itertools import groupby, count

# Read the file and split by the space between Value and its number
# Leave the case where the empty string '' exists without splitting its spaces
with open("my_input_file", 'r') as f:
    data = (k.split() if k != '' else k  for k in f.read().splitlines())

# Group the fields splitted, which are lists, in data
# And separate them by the field where the string 'Blank' exists 
sub = [list(v) for _, v in groupby(data, lambda x: isinstance(x, list))]

final = []
for elm in sub:
    # if the lenght of the grouped elements is > 1
    if len(elm) >1:
        # Convert the number of the values into an int
        # For further calculations
        dd = map(lambda x: [x[0], int(x[1])], elm)

        # Group the consecutive numbers of elem
        for _,v in groupby(dd , lambda x,y=count(): x[1] - next(y)):
            # If there is a consecutive numbers
            bb = list(v)
            if len(bb) >1:
                # Conveert them into strings. Then, append the first and the final one to the final list
                final.append(' '.join(map(str, bb[0])) + ',' + ' '.join(map(str, bb[-1])))

            # If there is'nt any consecutif numbers. Append the element to the final list
            else:
                final.append(" ".join(map(str, bb[0])))


# create the output file
with open("output_file", 'a') as f:
    for k in final:
        f.write(k + '\n')

此代码将输出包含以下内容的文件:

Value 1,Value 3
Value 4,Value 5
Value 6,Value 8
Value 9,Value 11
Value 12,Value 13

如果您有任何问题,请测试此代码并留下您的反馈,或者如果您发现错误,请报告错误。

修改

根据您的上一次编辑。

如果您的输入文件是:

What Test 
Makes No Sense 
is This 

My name 
Is Sample 123 

Your Name 
is ABC 2134 

What is you 
technical question don't know 
name?

诀窍很简单。您可以仅使用groupby模块中的itertools

执行此类操作
from itertools import groupby

with open("my_input_file", 'r') as f:
    data = f.read().splitlines()

final = [list(v) for _, v in groupby(data, lambda x: x != '')]

with open("ouput_file", 'a') as f:
    for k in final:
        if k != ['']:
            f.write(k[0] + ',' + k[-1] + '\n')

并且,您的输出文件将是:

What Test ,is This 
My name ,Is Sample 123 
Your Name ,is ABC 2134 
What is you ,name?

答案 1 :(得分:0)

为了旋转:你知道所有行在末尾都有一个新行

with open("PATH TO FILE.txt", r) as file:
    input = file.read()
    input.replace("\n", "")

表示只有空格的行,或识别它们。到目前为止:

   with open("PATH TO FILE.txt", r) as file:
        input = file.read()
        if not line.strip():
           input.replace("\n", "")

并且你可以保持计数或做一个while循环,这样你就可以计算,直到你只用空格来划线,并且在计数时将每一行放在一个列表或其他东西中, 如果你计算3抓住第一个和第三个,否则抓住两个。请记住重置计数