Question

我正在尝试联系CSV行。我试图将CSV行转换为按熊猫列出，但由于某些文件为空，因此会附加“ nan”值。另外，我尝试使用zip，但它会合并列值。

    with open(i) as f:
        lines = f.readlines()
        res = ""
        for i, j in zip(lines[0].strip().split(','), lines[1].strip().split(',')):
            res += "{} {},".format(i, j)
            print(res.rstrip(','))
        for line in lines[2:]:
            print(line)

我的数据如下，

输入数据：- Input CSV Data

预期输出：- Output CSV Data

行数大于3，此处仅给出示例。建议一种无需创建新文件即可完成上述任务的方法。请指向任何特定的功能或示例代码。

Answer 1

只要文件中没有奇怪的东西，这样的事情就应该起作用：

with open(i) as f:
  result = []
  for line in f:
    result += line.strip().split(',')
  print(result)

Answer 2

这假设您的第一行包含正确的列数。它将读取整个文件，忽略空数据（“ 、、、、、、”），并累积足够的数据点以填充一行，然后切换到下一行：

编写测试文件：

with open ("f.txt","w")as f:
    f.write("""Circle,Year,1,2,3,4,5,6,7,8,9,10,11,12
abc,2018,,,,,,,,,,,,
2.2,8.0,6.5,9,88,,,,,,,,,,
55,66,77,88,,,,,,,,,,
5,3.2,7

def,2017,,,,,,,,,,,,
2.2,8.0,6.5,9,88,,,,,,,,,,
55,66,77,88,,,,,,,,,,
5,3.2,7

""")

处理测试文件：

data = [] # all data
temp = [] # data storage until enough found , then put into data 

with open("f.txt","r") as r:
    # get header and its lenght
    title = r.readline().rstrip().split(",")
    lenTitel = len(title)
    data.append(title)

    # process all remaining lines of the file
    for l in r:
        t = l.rstrip().split(",") # read one lines data
        temp.extend( (x for x in t if x) ) # this eliminates all empty ,, pieces even in between 
        # if enough data accumulated, put as sublist into data, keep rest
        if len (temp) > lenTitel:
            data.append( temp[:lenTitel] )
            temp = temp [lenTitel:]
    if temp:
        data.append(temp)

print(data)

输出：

[['Circle', 'Year', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12'], 
 ['abc', '2018', '2.2', '8.0', '6.5', '9', '88', '55', '66', '77', '88', '5', '3.2', '7'], 
 ['def', '2017', '2.2', '8.0', '6.5', '9', '88', '55', '66', '77', '88', '5', '3.2', '7']]

备注：

您的文件不能包含换行符，否则标题的大小不正确。
两者之间的换行不会造成损害
您不能拥有“空”单元格-它们会被淘汰

在python中

2 个答案: