如何在pandas

时间:2017-10-09 23:44:50

标签: python pandas

嘿伙计们,我的数据看起来像train.dat。我试图创建一个varible,它将包含包含(-1或1)的列的[ith]值,另一个变量用于保存包含字符串的列的值。

到目前为止,我已尝试过这个,

  df=pd.read_csv("train.dat",delimiter="\t", sep=',')
# print(df.head())


# separate names from classes
vals = df.ix[:,:].values
names = [n[0][3:] for n in vals]
cls = [n[0][0:] for n in vals]
print(cls)

然而,输出看起来都混乱了,任何帮助将不胜感激。我是蟒蛇的初学者

1 个答案:

答案 0 :(得分:1)

如果数字后面的字符是标签,那么你就可以了,只需要

import io # using io.StringIO for demonstration
import pandas as pd

ratings = "-1\tThis movie really sucks.\n-1\tRun colored water through 
a reflux condenser and call it a science movie?\n+1\tJust another zombie flick? You'll be surprised!"

df = pd.read_csv(io.StringIO(ratings), sep='\t', 
                 header=None, names=['change', 'rating'])
  • 传递header=None可确保第一行被解释为数据。
  • 传递names=['change', 'rating']提供了一些(合理的)列标题。

当然,角色不是标签:D。

import io # using io.string
import pandas as pd

ratings = "-1 This movie really sucks.\n-1 Run colored water through a 
reflux condenser and call it a science movie?\n+1 Just another zombie 
flick? You'll be surprised!"

df = pd.read_csv(io.StringIO(ratings), sep='\t', 
                 header=None, names=['stuff'])

df['change'], df['rating'] = df.stuff.str[:3], df.stuff.str[3:] 
df.drop('stuff', axis=1)

一个可行的选择是将整个评级读入一个临时列,拆分字符串,将其分配到两列并最终删除临时列。