使用python透视CSV字符串而不使用pandas或任何类似的库

时间:2019-04-29 11:22:06

标签: python csv dictionary

您可能会认为这是另一个多余的问题,但是我尝试过所有类似的问题,到目前为止,还没有运气。在我的特定用例中,我不能使用熊猫或任何其他类似的库进行此操作。

这就是我的输入内容

AttributeName,Value
Name,John
Gender,M
PlaceofBirth,Texas
Name,Alexa
Gender,F
SurName,Garden

这是我的预期输出

Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,

到目前为止,我已尝试将输入存储到字典中,然后尝试将其写入csv字符串。但是,它失败了,因为我不确定如何合并缺少的列值条件。到目前为止,这是我的代码

  reader = csv.reader(csvstring.split('\n'), delimiter=',')

  csvdata = {}
  csvfile = ''
  for row in reader:
    if row[0] != '' and row[0] in csvdata and row[1] != '':
      csvdata[row[0]].append(row[1])
    elif row[0] != '' and row[0] in csvdata and row[1] == '':
      csvdata[row[0]].append(' ')
    elif row[0] != '' and row[1] != '':
      csvdata[row[0]] = [row[1]]
    elif row[0] != '' and row[1] == '':
      csvdata[row[0]] = [' ']

  for key, value in csvdata.items():
    if value == ' ':
      csvdata[key] = []

  csvfile += ','.join(csvdata.keys()) + '\n'
  for row in zip(*csvdata.values()):
    csvfile += ','.join(row) + '\n'

对于上述代码,我也获得了一些帮助here。预先感谢您的任何建议。

编辑#1:更新代码以暗示我正在处理csv字符串而不是csv文件。

3 个答案:

答案 0 :(得分:1)

您需要的是这样的

import csv

with open("in.csv") as infile:
    buffer = []
    item = {}

    lines = csv.reader(infile)
    for line in lines:
        if line[0] == 'Name':
            buffer.append(item.copy())
            item = {'Name':line[1]}
        else:
            item[line[0]] = line[1]
    buffer.append(item.copy())

for item in buffer[1:]:
    print item

答案 1 :(得分:0)

这对我有用:

with open("in.csv") as infile, open("out.csv", "w") as outfile:
    incsv, outcsv = csv.reader(infile), csv.writer(outfile)
    incsv.__next__()  # Skip 1st row
    outcsv.writerows(zip(*incsv))

更新:用于作为字符串输入和输出:

import csv, io

with io.StringIO(indata) as infile, io.StringIO() as outfile:
    incsv, outcsv = csv.reader(infile), csv.writer(outfile)
    incsv.__next__()  # Skip 1st row
    outcsv.writerows(zip(*incsv))

    print(outfile.getvalue())

答案 2 :(得分:0)

如果所有属性都不是强制性的,则我认为需要重新排列@framontb解决方案,以便在未提供Name字段时也能正常工作。
这是一个免导入的解决方案,并且不是超级优雅。

我假设您已经使用此列创建了以下表格中的行:

lines = [
    "Name,John",
    "Gender,M",
    "PlaceofBirth,Texas",
    "Gender,F",
    "Name,Alexa",
    "Surname,Garden"  # modified typo here: SurName -> Surname
]

cols = ["Name", "Gender", "Surname", "PlaceofBirth"]

我们需要将一条记录与另一条记录区分开,如果没有必填字段,我能做的最好的事情就是在已经看到属性 时开始考虑一条新记录。
为此,我使用属性tempcols的临时列表,从中删除元素,直到出现错误为止,即新记录。

代码:

csvdata = {k:[] for k in cols}

tempcols = list(cols)
for line in lines:
    attr, value = line.split(",")
    try:
        csvdata[attr].append(value)
        tempcols.remove(attr)
    except ValueError:
        for c in tempcols:  # now tempcols has only "missing" attributes 
            csvdata[c].append("")
        tempcols = [c for c in cols if c != attr]
for c in tempcols:
    csvdata[c].append("")

# write csv string with the code you provided
csvfile = ""
csvfile += ",".join(csvdata.keys()) + "\n"
for row in zip(*csvdata.values()):
    csvfile += ",".join(row) + "\n"

>>> print(csvfile)
Name,PlaceofBirth,Surname,Gender
John,Texas,,M
Alexa,,Garden,F

同时,如果要根据所需的输出对列进行排序:

csvfile = ""
csvfile += ",".join(cols) + "\n"
for row in zip(*[csvdata[k] for k in cols]):
    csvfile += ",".join(row) + "\n"

>>> print(csvfile)
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,