您可能会认为这是另一个多余的问题,但是我尝试过所有类似的问题,到目前为止,还没有运气。在我的特定用例中,我不能使用熊猫或任何其他类似的库进行此操作。
这就是我的输入内容
AttributeName,Value
Name,John
Gender,M
PlaceofBirth,Texas
Name,Alexa
Gender,F
SurName,Garden
这是我的预期输出
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,
到目前为止,我已尝试将输入存储到字典中,然后尝试将其写入csv字符串。但是,它失败了,因为我不确定如何合并缺少的列值条件。到目前为止,这是我的代码
reader = csv.reader(csvstring.split('\n'), delimiter=',')
csvdata = {}
csvfile = ''
for row in reader:
if row[0] != '' and row[0] in csvdata and row[1] != '':
csvdata[row[0]].append(row[1])
elif row[0] != '' and row[0] in csvdata and row[1] == '':
csvdata[row[0]].append(' ')
elif row[0] != '' and row[1] != '':
csvdata[row[0]] = [row[1]]
elif row[0] != '' and row[1] == '':
csvdata[row[0]] = [' ']
for key, value in csvdata.items():
if value == ' ':
csvdata[key] = []
csvfile += ','.join(csvdata.keys()) + '\n'
for row in zip(*csvdata.values()):
csvfile += ','.join(row) + '\n'
对于上述代码,我也获得了一些帮助here。预先感谢您的任何建议。
编辑#1:更新代码以暗示我正在处理csv字符串而不是csv文件。
答案 0 :(得分:1)
您需要的是这样的
import csv
with open("in.csv") as infile:
buffer = []
item = {}
lines = csv.reader(infile)
for line in lines:
if line[0] == 'Name':
buffer.append(item.copy())
item = {'Name':line[1]}
else:
item[line[0]] = line[1]
buffer.append(item.copy())
for item in buffer[1:]:
print item
答案 1 :(得分:0)
这对我有用:
with open("in.csv") as infile, open("out.csv", "w") as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
更新:用于作为字符串输入和输出:
import csv, io
with io.StringIO(indata) as infile, io.StringIO() as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
print(outfile.getvalue())
答案 2 :(得分:0)
如果所有属性都不是强制性的,则我认为需要重新排列@framontb解决方案,以便在未提供Name
字段时也能正常工作。
这是一个免导入的解决方案,并且不是超级优雅。
我假设您已经使用此列创建了以下表格中的行:
lines = [
"Name,John",
"Gender,M",
"PlaceofBirth,Texas",
"Gender,F",
"Name,Alexa",
"Surname,Garden" # modified typo here: SurName -> Surname
]
cols = ["Name", "Gender", "Surname", "PlaceofBirth"]
我们需要将一条记录与另一条记录区分开,如果没有必填字段,我能做的最好的事情就是在已经看到属性 时开始考虑一条新记录。
为此,我使用属性tempcols
的临时列表,从中删除元素,直到出现错误为止,即新记录。
代码:
csvdata = {k:[] for k in cols}
tempcols = list(cols)
for line in lines:
attr, value = line.split(",")
try:
csvdata[attr].append(value)
tempcols.remove(attr)
except ValueError:
for c in tempcols: # now tempcols has only "missing" attributes
csvdata[c].append("")
tempcols = [c for c in cols if c != attr]
for c in tempcols:
csvdata[c].append("")
# write csv string with the code you provided
csvfile = ""
csvfile += ",".join(csvdata.keys()) + "\n"
for row in zip(*csvdata.values()):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,PlaceofBirth,Surname,Gender
John,Texas,,M
Alexa,,Garden,F
同时,如果要根据所需的输出对列进行排序:
csvfile = ""
csvfile += ",".join(cols) + "\n"
for row in zip(*[csvdata[k] for k in cols]):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,