我有一个csv文件,格式如下:
"age","job","marital","education","default","balance","housing","loan"
58,"management","married","tertiary","no",2143,"yes","no"
44,"technician","single","secondary","no",29,"yes","no"
但是,它们不是由制表符(不同的列)分隔,而是位于相同的第一列中。当我尝试使用pandas读取它时,输出会在同一列表中提供所有值,而不是列表列表。
我的代码:
dataframe = pd.read_csv("marketing-data.csv", header = 0, sep= ",")
dataset = dataframe.values
print(dataset)
O / P:
[[58 'management' 'married' ..., 2143 'yes' 'no']
[44 'technician' 'single' ..., 29 'yes' 'no']]
我需要什么:
[[58, 'management', 'married', ..., 2143, 'yes', 'no']
[44 ,'technician', 'single', ..., 29, 'yes', 'no']]
我错过了什么?
答案 0 :(得分:2)
I think you are confused by the print()
output which doesn't show commas.
Demo:
In [1]: df = pd.read_csv(filename)
Pandas representation:
In [2]: df
Out[2]:
age job marital education default balance housing loan
0 58 management married tertiary no 2143 yes no
1 44 technician single secondary no 29 yes no
Numpy representation:
In [3]: df.values
Out[3]:
array([[58, 'management', 'married', 'tertiary', 'no', 2143, 'yes', 'no'],
[44, 'technician', 'single', 'secondary', 'no', 29, 'yes', 'no']], dtype=object)
Numpy string
representation (result of print(numpy_array)
):
In [4]: print(df.values)
[[58 'management' 'married' 'tertiary' 'no' 2143 'yes' 'no']
[44 'technician' 'single' 'secondary' 'no' 29 'yes' 'no']]
Conclusion: your CSV file has been parsed correctly.
答案 1 :(得分:1)
I don't really see a difference between what you want and what you get.. but parsing the csv file with the built in csv module give your desired result
import csv
with open('file.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='|')
print list(spamreader)
[
['age', 'job', 'marital', 'education', 'default', 'balance', 'housing', 'loan'],
['58', 'management', 'married', 'tertiary', 'no', '2143', 'yes', 'no'],
['44', 'technician', 'single', 'secondary', 'no', '29', 'yes', 'no']
]