Question

我有一个txt文件，格式如下：

[(u'this guy',u'hey there',u'dfd fasd awe wedsad,daeraes',1),
 (u'that guy',u'cya',u'dfd fasd es',1),
 (u'another guy',u'hi',u'dfawe wedsad,daeraes',-1)]

我希望将其作为包含4列的数据框导入python。我试过了：

trial = []
for line in open('filename.txt','r'):
     trial.append(line.rstrip())

将每行作为文本。使用：

import pandas as pd
pd.read_csv('filename.txt', sep=",", header = None)

使用pandas中的read_csv并用逗号分隔它也考虑了变量文本中的逗号。

             0               1                 2                   3        4   5
    0   [(u'this guy'   u'hey there'    u'dfd fasd awe wedsad   daeraes'    1)  NaN
    1   (u'that guy'    u'cya'           u'dfd fasd es'           1)      NaN   NaN
    2   (u'another guy' u'hi'            u'dfawe wedsad         daeraes' -1)]   NaN

知道怎么绕过那个？

Answer 1

我假设你的意思是python，而不是matlab。

数据已经是一个矩阵。

aa=[(u'this guy',u'hey there',u'dfd fasd awe wedsad,daeraes',1),
 (u'that guy',u'cya',u'dfd fasd es',1),
 (u'another guy',u'hi',u'dfawe wedsad,daeraes',-1)]


for i in range(3):
    for j in range(4):
        print aa[i][j]

输出：

this guy
hey there
dfd fasd awe wedsad,daeraes
1
that guy
cya
dfd fasd es
1
another guy
hi
dfawe wedsad,daeraes
-1

Answer 2

假设您有data.txt中的数据。

py_array = eval(open("data.txt").read())
dataframe = pd.DataFrame(py_array)

Python需要先解析文件。使用read_csv没有意义，因为它与csv格式不够接近。

在python中将txt导入为dataframe

2 个答案: