在python中将.dat转换为.csv

时间:2017-10-09 01:49:27

标签: python pandas csv

我想将.dat文件的数据集转换为csv文件。数据格式如下,

Each row begins with the sentiment score followed by the text associated with that rating.

Image of the .dat file

我希望情绪值(-1或1)有一列和与情绪值相对应的评论文本,以便进行评论以获得一列。

我为什么这么做了

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np  
import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("train.dat").readlines()]

# write it as a new CSV file
with open("train.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)
def your_func(row):
    return row['Sentiments'] / row['Review']

columns_to_keep = ['Sentiments', 'Review']
dataframe = pd.read_csv("train.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe

生成的train.csv的示例屏幕截图在评论中的每个单词后面都有逗号。

Output of the train.csv

2 个答案:

答案 0 :(得分:3)

如果您的所有行都遵循该一致格式,则可以使用pd.read_fwf。这比使用read_csv更安全,如果您的第二列还包含您尝试拆分的分隔符。

示例data.txt

-1  ieafxf  rjzy xfxk ymi wuy
+1  lqqm  ceegjnbjpxnidygr
-1  zss awoj anxb rfw  kgbvnl
df = pd.read_fwf('data.txt', header=None, 
        widths=[2, int(1e5)], names=['label', 'text'])

print(df)
   label                       text
0     -1  ieafxf  rjzy xfxk ymi wuy
1      1     lqqm  ceegjnbjpxnidygr
2     -1  zss awoj anxb rfw  kgbvnl

答案 1 :(得分:0)

正如评论中所提到的,read_csv在这里是合适的。

df = pd.read_csv('train_csv.csv', sep='\t', names=['Sentiments', 'Review'])

  Sentiments     Review
0         -1    alskjdf
1          1      asdfa
2          1       afsd
3         -1        sdf