需要根据条件在CSV文件中查找单词

时间:2017-01-11 22:28:14

标签: python python-3.x

我真的被卡住了。我的任务是过滤5000记录CSV的日期以查找特定日期范围,按升序排序,然后获取创建句子的不同列的字段。我已经能够成功地对日期进行排序并对它们进行排序,但我现在的问题是我不知道如何获得与该行对应的单词。这是代码:

#/usr/bin/python3

import csv
import time


def finder():
    with open('sample_data.csv', encoding="utf8") as csvfile:
        reader = csv.DictReader(csvfile)
        r = [] # This will hold our ID numbers for rows
        c = [] # This will hold our initial dates that are filtered out from the main csv
        l = [] # This will hold our sorted dates from c
        w = [] # This will hold our words 
        sentence = '' #This will be our sentence

        # Filter out created_at dates we don't care about

        def filterDates():
            for row in reader:
                createdOn = float(row['created_at'])
                d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates

                if d < '2014-06-22':
                    pass
                else:
                    c.append(d)

        filterDates()

        def sort(c):
            for i in c:
                if i > '2014-06-22' and i < '2014-07-22':
                    l.append(i)
                    l.sort(reverse=False)
                else:
                    pass

        sort(c)

        def findWords(l):
            for row in reader:
                words = row['word']
                for x in range(l):
                    print(words[0])

        findWords(l)

finder()

我知道这段代码可能很邋and而且到处都是。我认为这是对工作的挑战,并认为我可以轻松地做到这一点,但显然我的Python不太适合。我之前没有使用过Python CSV。我会马上说我不再计划申请这项工作,但如果我无法弄清楚这会让我发疯。我已经花了好几个小时尝试不同的事情,我的问题在于如何获取具有正确日期的行并获得单词。

所有建议和帮助表示赞赏!为了我自己的理智,我需要弄明白这一点。

谢谢, RDD

数据样本:

id  created_at  first_name  last_name   email   gender  company currency    word    drug_brand  drug_name   drug_company    pill_color  frequency   token   keywords
1   1309380645  Stephanie   Franklin    sfranklin0@sakura.ne.jp Female  Latz    IDR transitional    SUNLEYA Age minimizing sun care AVOBENZONE, OCTINOXATE, OCTISALATE, OXYBENZONE  C.F.E.B. Sisley Maroon  Yearly  ______T______h__e________ _______N__e__z_____p______e_____________d______i______a_____n__ _____h__i__v__e___-_____m___i____n__d__ _____________f ________c_______h__a__________s_.__ _Z________a_____l_____g________o__._   est risus auctor sed tristique in
2   1237178109  Michelle    Fowler  mfowler1@oracle.com Female  Skipstorm   EUR flexibility Medulla Arnica  Medulla Arnica  Uriel Pharmacy Inc. Yellow  Once    _____   morbi vestibulum velit id
3   1303585711  Betty   Barnes  bbarnes2@howstuffworks.com  Female  Skibox  IDR workforce   Rash Relief Zinc Oxide Dimethicone  Touchless Care Concepts LLC Purple  Monthly ___ ac est lacinia
4   1231175716  Jerry   Rogers  jrogers3@canalblog.com  Male    Cogibox IDR content-based   up and up acid controller complete  Famotidine, Calcium Carbonate, Magnesium Hydroxide  Target Corporation  Maroon  Daily   NIL augue a suscipit nulla elit
5   1236709011  Harry   Garrett hgarrett4@mlb.com   Male    Yotz    RUB coherent    Vistaril    HYDROXYZINE PAMOATE Pfizer Laboratories Div Pfizer Inc  Orange  Never   �_nb_l_ _u___ __olop __ __oq_l _n _unp_p__u_ _od___ po_sn__ op p_s '__l_ _u__s_d_p_ _n_____suo_ '____ __s _olop _nsd_ ___o_   morbi ut odio cras
6   1400030214  Lori    Martin  lmartin5@apache.org Female  Aivee   EUR software    Fluorouracil    Fluorouracil    Taro Pharmaceutical Industries Ltd. Pink    Daily   _   dui vel sem
7   1368791435  Joe Turner  jturner6@elpais.com Male    Mycat   IRR tangible    Sulfacetamide Sodium    Sulfacetamide Sodium    Paddock Laboratories, LLC   Aquamarine  Often   1;DROP TABLE users  nulla facilisi cras non velit
8   1394919241  Ruth    Bryant  rbryant7@dell.com   Female  Browsecat   IDR incremental Pollens - Trees, Mesquite, Prosopis juliflora   Mesquite, Prosopis juliflora    Jubilant HollisterStier LLC Aquamarine  Weekly  ___________ et magnis dis
9   1352948920  Cynthia Lopez   clopez8@gov.uk  Female  Twitterbeat USD Up-sized    Ideal Flawless  Octinoxate, Titanium Dioxide    Avon Products, Inc  Red Daily   (_�_�___ ___)   purus eu magna
10  1319910259  Phillip Ross    pross9@ehow.com Male    Buzzshare   VEF data-warehouse  Serotonin   Serotonin   BioActive Nutritional   Orange  Weekly  __  vel sem

好吧,经过一些调整后,Westley White给予了很大的帮助,我能够实现这一功能!我把它压缩成一个嵌套的函数,正在做它应该做的事情!这是代码:

#/usr/bin/python3

import csv
import time

def finder():

    with open('sample_data.csv', 'r', encoding='latin-1') as csvfile:
        reader = csv.DictReader(csvfile)
        def dates(reader):
            # Set up variables
            date_range = []
            sentence = []

            # Initiate iteration through CSV
            for row in reader:
                createdOn = float(row['created_at'])
                words = str(row['word'])
                d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates

                if d >= '2014-06-22' and d <= '2014-07-22':
                    date_range.append(d)

                date_range.sort()

                for word in words:
                    if d in date_range:
                        sentence.append(word)

            print(sentence)

        dates(reader)

finder()

只剩下一个问题了。附加sentence[]后,它会一次附加一个字符。我不知道如何将字母组合成CSV列中的单词而不将它们组合在一起。有什么想法吗?

谢谢!

1 个答案:

答案 0 :(得分:2)

我不知道数据是如何格式化的,但这是我的尝试。

导入时间

def finder(start_date='2014-06-22', end_date='2014-07-22'):
    """ 
    :param start_date: Starting date
    :param end_date: Ending date
    """

    def filterDates(reader):
        datelist = []
        for row in reader:
            created_on = float(row['created_at'])
            d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates

            # Is between starting and ending dates
            if d >= start_date  and d <= end_date:
                # Going to use the created_on value so we dont have to reformat it again
                datelist.append(created_on)
        return datelist

    def findWords(reader, datelist):
        for row in reader:
            if  float(row['created_at']) in datelist:
                words = row['word']
                for word in words:      
                    print(word)

    with open('sample_data.csv', encoding="utf8") as csvfile:
        reader = csv.DictReader(csvfile)

    dates = filterDates(reader)
    dates = dates.sort()
    findWords(reader, dates)     

finder('2014-06-22', '2014-07-22')

编辑: 如果要将每个单词添加到列表中,请使用此

在循环外添加

sentence_list = []

更改

words = row['word'] 

word = row['word']

然后改变

for word in words:      
    print(word)

sentence_list.append(word)

如果你想使用字符串,请在循环之外添加它

sentence = ""

然后当您打印单词时,只需将其添加到句子

# adding a Word to the sentence
sentence = "{} {}".format(sentence, word)

最后将其添加到循环外部的底部

print(sentence)