Question

我想从给定的csv文件中提取中性词（到单独的.txt文件），但我对python很新，对文件处理知之甚少。我找不到中性词数据集，但在搜索到这里之后，这就是我能找到的。

以下是我要提取数据的Gtihub项目（以防万一需要知道）：old archived draft

Neutral Words
Word     Sentiment Score
a        0.0125160264947
the      0.00423728459134
it      -0.0294755274737
and      0.0810574365028
an       0.0318918766949
or      -0.274298468178
normal  -0.0270787859177

所以基本上我只想从csv中提取那些单词（text），其中数值为0.something。

Answer 1

即使不使用任何库，使用您正在使用的csv也相当容易。

首先打开文件（我假设您已将路径保存在变量filename中），然后使用readlines()函数读取文件，然后根据条件过滤掉你给。

with open(filename, 'r') as csv:                         # Open the file for reading
    rows = [line.split(',') for line in csv.readlines()] # Read each the file in lines, and split on commas
    filter = [line[0] for line in rows if abs(float(line[1])) < 1]   
                                                         # Filter out all lines where the second value is not equal to 1

这是现在接受的答案，所以我添加了免责声明。有很多原因可以解释为什么在没有考虑的情况下不应将此代码应用于其他CSV。

它读取内存中的整个CSV
不考虑例如引用

对于非常简单的CSV，这是可以接受的，但如果您无法确定CSV不会破坏此代码，则此处的其他答案会更好。

Answer 2

这是一种只使用vanilla libs并且不将整个文件保存在内存中的方法

import csv

def get_vals(filename):
    with open(filename, 'rb') as fin:
        reader = csv.reader(fin)
        for line in reader:
            if line[-1] <= 0:
                yield line[0]

words = get_vals(filename)

for word in words:
    do stuff...

Answer 3

像这样使用pandas：

import pandas
df = pandas.read_csv("yourfile.csv")
df.columns = ['word', 'sentiment']

按情绪选择词语：

positive = df[df['sentiment'] > 0]['word']
negative = df[df['sentiment'] < 0]['word']
neutral = df[df['sentiment'] == 0]['word']

Answer 4

如果您不想使用任何其他库，可以尝试使用csv模块。请注意，delimiter='\t'可能与您的情况不同。

import csv

f = open('name.txt', 'r')
reader = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE)
for row in reader:
   if(float(row[1]) > 0.0):
      print(row[0] + ' ' row[1])

如何从给定参数的csv文件中提取特定数据？

4 个答案: