我想将.dat
文件的数据集转换为csv
文件。数据格式如下,
Each row begins with the sentiment score followed by the text associated with that rating.
我希望情绪值(-1或1)有一列和与情绪值相对应的评论文本,以便进行评论以获得一列。
我为什么这么做了
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import csv
# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("train.dat").readlines()]
# write it as a new CSV file
with open("train.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
def your_func(row):
return row['Sentiments'] / row['Review']
columns_to_keep = ['Sentiments', 'Review']
dataframe = pd.read_csv("train.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)
print dataframe
生成的train.csv的示例屏幕截图在评论中的每个单词后面都有逗号。
答案 0 :(得分:3)
如果您的所有行都遵循该一致格式,则可以使用pd.read_fwf
。这比使用read_csv
更安全,如果您的第二列还包含您尝试拆分的分隔符。
示例data.txt
:
-1 ieafxf rjzy xfxk ymi wuy
+1 lqqm ceegjnbjpxnidygr
-1 zss awoj anxb rfw kgbvnl
df = pd.read_fwf('data.txt', header=None,
widths=[2, int(1e5)], names=['label', 'text'])
print(df)
label text
0 -1 ieafxf rjzy xfxk ymi wuy
1 1 lqqm ceegjnbjpxnidygr
2 -1 zss awoj anxb rfw kgbvnl
答案 1 :(得分:0)
正如评论中所提到的,read_csv在这里是合适的。
df = pd.read_csv('train_csv.csv', sep='\t', names=['Sentiments', 'Review'])
Sentiments Review
0 -1 alskjdf
1 1 asdfa
2 1 afsd
3 -1 sdf