从csv文件中读取特定列,然后使用python写入另一个CSV

时间:2014-03-29 13:37:58

标签: python csv

我有一个csv文件,如下所示:

Help    I understand    Attention please    Ok  I see   Damn    How sweet   That is too bad Come on Whatever    That is bad It is cold  That is dumb    Oh no   What    Is that right   Disgusting  This is hopeless    Really  I am angry  I wonder    I donot like this   Let us celebrate    I donot know    Yes Lovely  I am so evil    No  No  it isnot    Did not I see   Fancy   Wonderful   I am exerting myself    I didnot mean to do that    That hurts  Hey  you    It stinks   That is nothing That was close  Whispering Hey you  I cannot believe this   Be quiet    Go away Disappointing   Wait I am thinking  This is fun Unbelievable    Amazing Let  us celebrate   I am excited    Hey you  Did so Yes it is   Haha  well said Conversation    At Home Family  Time    Work    Past Actions    Games   Internet    Location    Fun Food/Clothes    Poetic  Books/Movies    Religion    Romance Swearing    Politics    Music   School  Business    end_with_able   end_with_al end_with_ful    end_with_ible   end_with_ic end_with_ive    end_with_less   end__with_ly    end_with_ous    sorry_word  Starting_with_Apolog    CC  CD  DT  EX  FW  IN  JJ  JJR JJS LS  MD  NN  NNP NNPS    NNS PDT POS PRP PR$ RB  RBR RBS RP  SYM TO  UH  VB  VBD VBG VBN VBP VBZ WDT WP  WP$ WRB CC CD   CC DT   CC EX   CC IN   CC JJ   CC JJR  CC JJS  CC MD   CC NN   CC NNS  CC PRP  CC RB   CC TO   CC VB   CC VBD  CC VBG  CC VBN  CC WP   CC WRB  CD DT   CD IN   CD JJ   CD NN   CD NNS  CD PRP  CD PRP$ DT CC   DT CD   DT DT   DT IN   DT JJ   DT JJR  DT JJS  DT MD   DT NN   DT NNS  DT PRP$ DT RB   DT VBD  DT VBG  DT VBZ  DT WDT  DT WP   EX VBD  EX VBZ  IN CC   IN CD   IN DT   IN IN   IN JJ   IN JJR  IN JJS  IN NN   IN NNS  IN PRP  IN PRP$ IN RB   IN TO   IN VBG  IN VBN  IN VBZ  IN WDT  IN WP   IN WRB  JJ CC   JJ DT   JJ EX   JJ IN   JJ JJ   JJ NN   JJ NNS  JJ PRP  JJ PRP$ JJ RB   JJ TO   JJ VBD  JJ VBN  JJ VBP  JJ VBZ  JJ WP   JJR DT  JJR IN  JJR JJ  JJR NN  JJR NNS JJR TO  JJS IN  JJS JJ  JJS NN  JJS NNS JJS TO  MD PRP  MD RB   MD VB   NN CC   NN CD   NN DT   NN IN   NN JJ   NN JJS  NN MD   NN NN   NN NNS  NN PRP  NN PRP$ NN RB   NN RBR  NN RP   NN TO   NN VB   NN VBD  NN VBG  NN VBN  NN VBP  NN VBZ  NN WDT  NN WP   NN WP$  NN WRB  NNS CC  NNS DT  NNS IN  NNS JJ  NNS JJR NNS NN  NNS NNS NNS PRP NNS PRP$    NNS RB  NNS TO  NNS VBD NNS VBG NNS VBN NNS VBP NNS VBZ NNS WDT NNS WP  NNS WRB PRP CC  PRP DT  PRP IN  PRP JJ  PRP JJR PRP MD  PRP NN  PRP NNS PRP PRP PRP RB  PRP RP  PRP TO  PRP VB  PRP VBD PRP VBG PRP VBP PRP VBZ PRP WP  PRP$ CC PRP$ JJ PRP$ NN PRP$ NNS    PRP$ RB RB CC   RB CD   RB DT   RB IN   RB JJ   RB MD   RB NN   RB NNS  RB PRP  RB RB   RB RBR  RB TO   RB VB   RB VBD  RB VBG  RB VBN  RB VBP  RB VBZ  RB WP   RB WRB  RBR JJ  RBR RB  RP CC   RP DT   RP IN   RP PRP  RP PRP$ RP RB   RP TO   RP WP   TO CD   TO DT   TO IN   TO JJ   TO JJR  TO NN   TO NNS  TO PRP  TO PRP$ TO VB   TO VBN  VB CC   VB CD   VB DT   VB IN   VB JJ   VB JJR  VB JJS  VB NN   VB NNS  VB PRP  VB PRP$ VB RB   VB RBR  VB RP   VB TO   VB VBD  VB VBG  VB VBN  VB VBP  VB VBZ  VBD CC  VBD CD  VBD DT  VBD IN  VBD JJ  VBD NN  VBD NNS VBD PRP VBD PRP$    VBD RB  VBD RP  VBD TO  VBD VB  VBD VBG VBD VBN VBD WP  VBG CC  VBG DT  VBG IN  VBG JJ  VBG JJR VBG MD  VBG NN  VBG NNS VBG PRP VBG PRP$    VBG RB  VBG RP  VBG TO  VBG VBN VBG VBZ VBG WP  VBG WRB VBN CC  VBN DT  VBN IN  VBN JJ  VBN NN  VBN NNS VBN PRP VBN PRP$    VBN RB  VBN RP  VBN TO  VBN VBG VBN VBN VBP CC  VBP CD  VBP DT  VBP IN  VBP JJ  VBP JJR VBP NN  VBP NNS VBP PRP VBP PRP$    VBP RB  VBP RP  VBP TO  VBP VBD VBP VBG VBP VBN VBP WP  VBP WRB VBZ DT  VBZ IN  VBZ JJ  VBZ JJR VBZ JJS VBZ NN  VBZ NNS VBZ PRP VBZ PRP$    VBZ RB  VBZ RP  VBZ TO  VBZ VB  VBZ VBG VBZ VBN VBZ VBZ VBZ WRB WDT IN  WDT JJ  WDT NN  WDT NNS WDT PRP WDT VBD WDT VBP WP DT   WP JJ   WP MD   WP PRP  WP RB   WP VBD  WP VBN  WP VBP  WP VBZ  WP$ NN  WRB DT  WRB JJ  WRB NNS WRB PRP WRB RB  WRB VBD CC CD JJ    CC CD NN    CC DT IN    CC DT NN    CC DT NNS   CC EX VBD   CC EX VBZ   CC IN DT    CC IN IN    CC IN NN    CC IN PRP   CC JJ IN    CC JJ NN    CC JJ NNS   CC JJ VBP   CC JJR IN   CC JJS NNS  CC MD RB    CC MD VB    CC NN CC    CC NN IN    CC NN JJ    CC NN NN    CC NN PRP   CC NN RB    CC NN VBD   CC NN VBG   CC NNS CC   CC NNS IN   CC NNS PRP  CC NNS PRP$ CC NNS RB   CC NNS VBD  CC PRP MD   CC PRP RB   CC PRP VBD  CC PRP VBP  CC PRP VBZ  CC RB CD    CC RB DT    CC RB IN    CC RB JJ    CC RB PRP   CC RB RBR   CC RB VB    CC RB VBD   CC RB VBG   CC RB WP    CC TO VB    CC VB DT    CC VB IN    CC VB NN    CC VB PRP   CC VB TO    CC VB VBD   CC VB VBG   CC VB VBZ   CC VBD DT   CC VBD IN   CC VBD JJ   CC VBD NN   CC VBD NNS  CC VBD PRP  CC VBD PRP$ CC VBD RB   CC VBD TO   CC VBD VBG  CC VBG IN   CC VBG NNS  CC VBG RB   CC VBN DT   CC VBN NN   CC VBN PRP  CC WP RB    CC WP VBD   CC WRB PRP  CD DT NN    CD IN DT    CD IN PRP   CD JJ NN    CD JJ NNS   CD NN IN    CD NN NN    CD NN VBD   CD NN VBN   CD NNS IN   CD NNS WP   CD PRP RB   CD PRP$ JJ  DT CC JJR   DT CC NNS   DT CC PRP   DT CC VBN   DT CD IN    DT DT CC    DT DT NN    DT DT NNS   DT DT VBG   DT IN DT    DT IN JJ    DT IN NN    DT IN PRP   DT IN PRP$  DT IN RB    DT IN WP    DT JJ DT    DT JJ IN    DT JJ JJ    DT JJ NN    DT JJ NNS   DT JJ RB    DT JJ VBD   DT JJR NN   DT JJS JJ   DT JJS NN   DT JJS TO   DT MD VB    DT NN CC    DT NN CD    DT NN DT    DT NN IN    DT NN JJ    DT NN MD    DT NN NN    DT NN NNS   DT NN PRP   DT NN PRP$  DT NN RB    DT NN RBR   DT NN TO    DT NN VBD   DT NN VBG   DT NN VBN   DT NN VBZ   DT NN WDT   DT NN WP    DT NN WRB   DT NNS CC   DT NNS IN   DT NNS NN   DT NNS PRP  DT NNS RB   DT NNS TO   DT NNS VBD  DT NNS VBN  DT NNS VBP  DT NNS VBZ  DT NNS WDT  DT NNS WRB  DT PRP$ NN  DT RB JJ    DT RB PRP   DT RB RB    DT RB VBG   DT RB VBN   DT RB VBZ   DT VBD NN   DT VBG IN   DT VBG NN   DT VBZ NN   DT VBZ VBG  DT VBZ VBN  DT WDT IN   DT WP VBN   EX VBD RB   EX VBZ JJR  EX VBZ RB   IN CC DT    IN CC IN    IN CC NN    IN CC VBN   IN CD NN    IN CD NNS   IN DT CD    IN DT DT    IN DT IN    IN DT JJ    IN DT JJR   IN DT JJS   IN DT MD    IN DT NN    IN DT NNS   IN DT RB    IN DT VBG   IN DT WDT   IN IN DT    IN IN IN    IN IN JJ    IN IN NN    IN IN PRP   IN IN PRP$  IN IN RB    IN JJ CC    IN JJ EX    IN JJ IN    IN JJ NN    IN JJ NNS   IN JJ PRP$  IN JJ TO    IN JJR NN   IN JJS NNS  IN NN CC    IN NN DT    IN NN IN    IN NN JJ    IN NN NN    IN NN NNS   IN NN PRP   IN NN PRP$  IN NN RB    IN NN TO    IN NN VBD   IN NN VBG   IN NN VBZ   IN NN WRB   IN NNS CC   IN NNS IN   IN NNS NNS  IN NNS PRP  IN NNS RB   IN NNS VBD  IN NNS VBG  IN NNS VBP  IN PRP CC   IN PRP DT   IN PRP IN   IN PRP MD   IN PRP PRP  IN PRP RB   IN PRP VBD  IN PRP VBP  IN PRP VBZ  IN PRP WP   IN PRP$ JJ  IN PRP$ NN  IN PRP$ NNS IN PRP$ RB  IN RB CC    IN RB DT    IN RB IN    IN RB NN    IN RB PRP   IN RB RB    IN RB VBD   IN RB VBG   IN TO VB    IN VBG DT   IN VBG IN   IN VBG JJ   IN VBG NN   IN VBG PRP  IN VBG TO   IN VBG VBN  IN VBN NN   IN VBZ DT   IN VBZ RB   IN WDT JJ   IN WP MD    IN WP PRP   IN WP VBN   IN WRB PRP  JJ CC MD    JJ CC NN    JJ CC PRP   JJ CC RB    JJ CC VBD   JJ CC WP    JJ DT DT    JJ DT JJ    JJ DT JJS   JJ DT NN    JJ DT RB    JJ DT VBZ   JJ EX VBZ   JJ IN DT    JJ IN IN    JJ IN JJR   JJ IN NN    JJ IN NNS   JJ IN PRP   JJ IN PRP$  JJ IN VBG   JJ IN WP    JJ JJ DT    JJ JJ IN    JJ JJ NN    JJ JJ NNS   JJ NN CC    JJ NN DT    JJ NN IN    JJ NN JJS   JJ NN MD    JJ NN NN    JJ NN NNS   JJ NN PRP   JJ NN RB    JJ NN TO    JJ NN VB    JJ NN VBD   JJ NN VBG   JJ NN VBN   JJ NN VBZ   JJ NN WDT   JJ NN WRB   JJ NNS CC   JJ NNS DT   JJ NNS IN   JJ NNS JJR  JJ NNS NNS  JJ NNS PRP  JJ NNS RB   JJ NNS VBG  JJ NNS VBN  JJ NNS VBP  JJ NNS WDT  JJ PRP RB   JJ PRP VBD  JJ PRP VBZ  JJ PRP$ NN  JJ RB IN    JJ RB NN    JJ RB PRP   JJ RB TO    JJ RB VBG   JJ RB VBN   JJ TO CD    JJ TO PRP   JJ TO VB    JJ VBD DT   JJ VBD TO   JJ VBN CC   JJ VBP IN   JJ VBZ IN   JJ WP VBN   JJR DT NN   JJR IN NN   JJR IN PRP$ JJR JJ IN   JJR NN CC   JJR NN DT   JJR NN IN   JJR NN PRP  JJR NN TO   JJR NNS WDT JJR TO NN   JJS IN DT   JJS JJ NN   JJS NN NN   JJS NN VBZ  JJS NNS DT  JJS NNS IN  JJS NNS VBP JJS TO VB   MD PRP VB   MD RB RB    MD RB VB    MD VB CC    MD VB DT    MD VB IN    MD VB JJ    MD VB PRP   MD VB RB    MD VB RP    MD VB TO    MD VB VBG   MD VB VBN   NN CC CD    NN CC DT    NN CC EX    NN CC IN    NN CC JJ    NN CC JJS   NN CC MD    NN CC NN    NN CC NNS   NN CC PRP   NN CC RB    NN CC VB    NN CC VBD   NN CC VBG   NN CC WP    NN CD DT    NN CD NN    NN CD PRP   NN DT DT    NN DT IN    NN DT JJS   NN DT NN    NN DT NNS   NN DT VBZ   NN IN CD    NN IN DT    NN IN IN    NN IN JJ    NN IN NN    NN IN NNS   NN IN PRP   NN IN PRP$  NN IN RB    NN IN VBG   NN IN VBN   NN IN VBZ   NN IN WDT   NN IN WP    NN IN WRB   NN JJ JJ    NN JJ NN    NN JJ NNS   NN JJ TO    NN JJ VBD   NN JJS NNS  NN MD VB    NN NN CC    NN NN CD    NN NN DT    NN NN IN    NN NN JJ    NN NN MD    NN NN NN    NN NN NNS   NN NN PRP   NN NN PRP$  NN NN RB    NN NN TO    NN NN VB    NN NN VBD   NN NN VBG   NN NN VBZ   NN NN WDT   NN NN WRB   NN NNS DT   NN NNS IN   NN NNS JJ   NN NNS NN   NN NNS NNS  NN NNS PRP  NN NNS PRP$ NN NNS RB   NN NNS TO   NN NNS VBD  NN NNS VBG  NN NNS VBP  NN NNS WP   NN PRP DT   NN PRP IN   NN PRP JJR  NN PRP MD   NN PRP NN   NN PRP PRP  NN PRP RB   NN PRP VB   NN PRP VBD  NN PRP VBP  NN PRP VBZ  NN PRP$ NN  NN PRP$ NNS NN RB CC    NN RB DT    NN RB IN    NN RB JJ    NN RB MD    NN RB NN    NN RB NNS   NN RB PRP   NN RB RB    NN RB TO    NN RB VBD   NN RB VBG   NN RB VBN   NN RB VBZ   NN RB WP    NN RB WRB   NN RBR JJ   NN RP IN    NN TO DT    NN TO JJ    NN TO NN    NN TO PRP   NN TO PRP$  NN TO VB    NN VB RB    NN VB VBP   NN VBD DT   NN VBD IN   NN VBD JJ   NN VBD NN   NN VBD NNS  NN VBD PRP  NN VBD PRP$ NN VBD RB   NN VBD TO   NN VBD VBG  NN VBD VBN  NN VBG CC   NN VBG DT   NN VBG IN   NN VBG JJ   NN VBG JJR  NN VBG NN   NN VBG NNS  NN VBG PRP  NN VBG PRP$ NN VBG RB   NN VBG WP   NN VBN IN   NN VBN NN   NN VBN RP   NN VBP RB   NN VBZ DT   NN VBZ IN   NN VBZ JJ   NN VBZ JJS  NN VBZ NN   NN VBZ PRP  NN VBZ RB   NN VBZ RP   NN VBZ TO   NN VBZ VBG  NN VBZ VBN  NN VBZ WRB  NN WDT JJ   NN WDT NN   NN WDT NNS  NN WDT PRP  NN WDT VBD  NN WP VBN   NN WP VBZ   NN WP$ NN   NN WRB DT   NN WRB NNS  NN WRB PRP  NNS CC DT   NNS CC IN   NNS CC JJ   NNS CC NN   NNS CC NNS  NNS CC PRP  NNS CC RB   NNS CC TO   NNS CC VBD  NNS DT JJ   NNS DT NN   NNS DT NNS  NNS DT VBZ  NNS IN CD   NNS IN DT   NNS IN IN   NNS IN JJ   NNS IN NN   NNS IN NNS  NNS IN PRP  NNS IN PRP$ NNS IN VBG  NNS IN WP   NNS JJ IN   NNS JJ NNS  NNS JJR IN  NNS NN NN   NNS NN PRP  NNS NN WP   NNS NNS CC  NNS NNS NNS NNS NNS PRP NNS NNS VBP NNS PRP CC  NNS PRP DT  NNS PRP MD  NNS PRP RB  NNS PRP VBD NNS PRP VBP NNS PRP VBZ NNS PRP$ CC NNS PRP$ NN NNS PRP$ NNS    NNS RB IN   NNS RB JJ   NNS RB PRP

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0   1   1   1   0   1   1   1   0   0   0   0   0   0   1   0   0   1   0   1   0   0   1   1   0   1   1   0   0   1   1   1   0   0   1   1   0   1   0   1   1   0   0   1   0   0   1   0   1   0   0   1   0   1   0   1   1   1   1   1   1   1   1   0   1   0   0   0   1   0   0   0   0   0   0   1   1   0   1   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   1   0   0   0   1   1   0   0   0   0   1   1   0   0   0   0   0   1   1   1   0   0   1   0   1   1   0   1   1   0   0   1   0   1   0   1   0   1   0   1   1   0   0   1   1   0   0   0   1   0   0   0   0   0   0   0   1   0   0   0   0   0   0   1   1   0   1   1   1   0   0   1   1   1   1   1   0   0   1   0   

我想从此文件中提取几列,并将这些列写入其他文件中。如何通过使用python 2.7.5?

指定列名来从CSV中提取列

我写了以下内容,但是写得不正确?

################################################################################
#...................Program to create reduced feature vector ...................
################################################################################
import ast
import csv
import os
import sys
from string import *
from BST import Node
import ast
import sys,time
sys.setrecursionlimit(20100)
def File_Write(filename,write_ist):
    filewrite=open(filename,"w")
    filewrite.writerows(str(write_ist))
    filewrite.close()



def read_file_list(feature_vector_file,selected_features) :

        f = open("Dataset/Cross/Ensemble_FVT.csv")
        reader = csv.reader(f)
        headers = None
        results = []

        for row in reader:
                if not headers:
                        headers = []
                        for i, col in enumerate(row):
##                          print i,col
                          if col in selected_features:
##                                  print  col
                            # Store the index of the cols of interest
                                  headers.append(i)


                else:
##                  print headers
                  results.append(list([row[i] for i in headers]))
##                  print results
        return results

##################################################################################
#.................................MAIN PROGRAM....................................
##################################################################################


feature_list = ""
root_flag = 'false'
sent_number = 1
fvt_length = 0
line=[]
result=[]
##Reading the class and feature_vector_length from command line .......................

gender = sys.argv[1]
max_fvt_length = sys.argv[2]

##Setting the path for input and output files .......................

file_path = "/home/user/Mini_Project/Dataset/Cross/Sorted_Features.csv"                                                 ;##Input file..............
feature_vector_file = "/home/user/Mini_Project/Dataset/Cross/"+str(max_fvt_length+gender)+".csv"            ;##Output file..............

##Creating the output directory if not existing .......................

d = os.path.dirname(feature_vector_file)
if not os.path.exists(d) :
        os.makedirs(d)

####Opening the output file in write mode ...................       
with open( feature_vector_file, "w" ) as fout :
        fp_feature = csv.writer( fout )
        fp_mi=csv.reader(open(file_path,"r"),delimiter=',')
        for row in fp_mi :
##              First field contains the feature and second field contains the feature_rank ..................
                feature = row[0]
##              Taking the top features only ...................
                if int(fvt_length) < int(max_fvt_length) :
##                        print fvt_length
##                      Checking for root node in the BST ...................

                        if root_flag == 'false' :
                                root = Node( feature )
                                root_flag = 'true'
                        else :
                                root.insert( feature )

                        feature_list = feature_list + "\n" + feature
                        fvt_length += 1
        feature_list1 = feature_list.strip()
        line = feature_list1.split('\n')
##        print "Number of features ",fvt_length
##        line.sort()
        line.append('Gender')  
        root.print_tree()        
        fp_feature.writerow(line)
####      Read files in separate classes and find the count of features in each class ...................  

        result=read_file_list(feature_vector_file,line)

        fp_feature.writerows(result)
        print "Extracted",fvt_length,"Features ranked using Mutual Information"

1 个答案:

答案 0 :(得分:1)

f = open("Dataset/Cross/Ensemble_FVT.csv", "r")
reader = csv.reader(f)

现在您可以遍历reader

for row in reader:
    # here you'll get individual item of a particular row and column
    print row["print_desired_column_number"]

如果在循环中使用row[0],则会打印出第一列。 并且要在新的csv文件中编写所需的列,您可以使用所需的行列项在循环中填充列表,并使用_csv.writer对象writerow方法将列表作为行写入。 让我们说你要写第1,3和5栏。然后,

data = open("output.csv", "w")
w = csv.writer(data)
for row in reader:
    my_row = []
    my_row.append(row[0])
    my_row.append(row[2])
    my_row.append(row[4])
    w.writerow(my_row)

现在,您将获得新的csv文件作为output.csv。