我必须使用具有以下格式的csv
Foo
Col1,Col2,Col3,Col4
value1,value2,value3,value4
value1,value2,value3,value4
Bar
value1,value2,value3,value4
value1,value2,value3,value4
...
当我使用pandas read_csv
函数时,此文件将作为单个列csv读入。 Foo
,Bar
和其他几个值之间的行数不一致。
所需的输出应采用
形式 newCol,Col1,Col2,Col3,Col4
Foo,value1,value2,value3,value4
Foo,value1,value2,value3,value4
Bar,value1,value2,value3,value4
Bar,value1,value2,value3,value4
当我尝试my_dataframe.stack()
时,它会在每行中插入Foo
并Bar
,并省略所有其他值。有没有办法使用熊猫甚至只是一些正则表达式的方法来获得我正在寻找的东西?
答案 0 :(得分:1)
你可以通过一种简单的方法来做到这一点:
with open("your_file") as f:
my_val = ""
header = ""
for i,line in enumerate(f):
line = line.split(",")
if len(line)==1:
my_val = line[0]
elif i==1:
header = "newCol," + ",".join(line)
print header
else:
print "{},{}".format(my_val, ",".join(line))
答案 1 :(得分:0)
import csv
with open('data', 'r', newline='') as f, open('data_out.csv', 'w', newline='') as f_out:
reader = csv.reader(f)
# read headers
headers = next(reader)
# insert new column name
headers.insert(0,"NewCol")
w = csv.writer(f_out, delimiter=',')
# write headers
w.writerow(headers)
for line in f:
if ',' not in line:
newcolumn = line.strip()
else:
line = newcolumn + ',' + line.strip()
line = line.split(',')
w.writerow(line)
data_out.csv
NewCol,Col1,Col2,Col3,Col4
Foo,value1,value2,value3,value4
Foo,value1,value2,value3,value4
Bar,value1,value2,value3,value4
Bar,value1,value2,value3,value4