我真的被卡住了。我的任务是过滤5000记录CSV的日期以查找特定日期范围,按升序排序,然后获取创建句子的不同列的字段。我已经能够成功地对日期进行排序并对它们进行排序,但我现在的问题是我不知道如何获得与该行对应的单词。这是代码:
#/usr/bin/python3
import csv
import time
def finder():
with open('sample_data.csv', encoding="utf8") as csvfile:
reader = csv.DictReader(csvfile)
r = [] # This will hold our ID numbers for rows
c = [] # This will hold our initial dates that are filtered out from the main csv
l = [] # This will hold our sorted dates from c
w = [] # This will hold our words
sentence = '' #This will be our sentence
# Filter out created_at dates we don't care about
def filterDates():
for row in reader:
createdOn = float(row['created_at'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
if d < '2014-06-22':
pass
else:
c.append(d)
filterDates()
def sort(c):
for i in c:
if i > '2014-06-22' and i < '2014-07-22':
l.append(i)
l.sort(reverse=False)
else:
pass
sort(c)
def findWords(l):
for row in reader:
words = row['word']
for x in range(l):
print(words[0])
findWords(l)
finder()
我知道这段代码可能很邋and而且到处都是。我认为这是对工作的挑战,并认为我可以轻松地做到这一点,但显然我的Python不太适合。我之前没有使用过Python CSV。我会马上说我不再计划申请这项工作,但如果我无法弄清楚这会让我发疯。我已经花了好几个小时尝试不同的事情,我的问题在于如何获取具有正确日期的行并获得单词。
所有建议和帮助表示赞赏!为了我自己的理智,我需要弄明白这一点。
谢谢, RDD
数据样本:
id created_at first_name last_name email gender company currency word drug_brand drug_name drug_company pill_color frequency token keywords
1 1309380645 Stephanie Franklin sfranklin0@sakura.ne.jp Female Latz IDR transitional SUNLEYA Age minimizing sun care AVOBENZONE, OCTINOXATE, OCTISALATE, OXYBENZONE C.F.E.B. Sisley Maroon Yearly ______T______h__e________ _______N__e__z_____p______e_____________d______i______a_____n__ _____h__i__v__e___-_____m___i____n__d__ _____________f ________c_______h__a__________s_.__ _Z________a_____l_____g________o__._ est risus auctor sed tristique in
2 1237178109 Michelle Fowler mfowler1@oracle.com Female Skipstorm EUR flexibility Medulla Arnica Medulla Arnica Uriel Pharmacy Inc. Yellow Once _____ morbi vestibulum velit id
3 1303585711 Betty Barnes bbarnes2@howstuffworks.com Female Skibox IDR workforce Rash Relief Zinc Oxide Dimethicone Touchless Care Concepts LLC Purple Monthly ___ ac est lacinia
4 1231175716 Jerry Rogers jrogers3@canalblog.com Male Cogibox IDR content-based up and up acid controller complete Famotidine, Calcium Carbonate, Magnesium Hydroxide Target Corporation Maroon Daily NIL augue a suscipit nulla elit
5 1236709011 Harry Garrett hgarrett4@mlb.com Male Yotz RUB coherent Vistaril HYDROXYZINE PAMOATE Pfizer Laboratories Div Pfizer Inc Orange Never �_nb_l_ _u___ __olop __ __oq_l _n _unp_p__u_ _od___ po_sn__ op p_s '__l_ _u__s_d_p_ _n_____suo_ '____ __s _olop _nsd_ ___o_ morbi ut odio cras
6 1400030214 Lori Martin lmartin5@apache.org Female Aivee EUR software Fluorouracil Fluorouracil Taro Pharmaceutical Industries Ltd. Pink Daily _ dui vel sem
7 1368791435 Joe Turner jturner6@elpais.com Male Mycat IRR tangible Sulfacetamide Sodium Sulfacetamide Sodium Paddock Laboratories, LLC Aquamarine Often 1;DROP TABLE users nulla facilisi cras non velit
8 1394919241 Ruth Bryant rbryant7@dell.com Female Browsecat IDR incremental Pollens - Trees, Mesquite, Prosopis juliflora Mesquite, Prosopis juliflora Jubilant HollisterStier LLC Aquamarine Weekly ___________ et magnis dis
9 1352948920 Cynthia Lopez clopez8@gov.uk Female Twitterbeat USD Up-sized Ideal Flawless Octinoxate, Titanium Dioxide Avon Products, Inc Red Daily (_�_�___ ___) purus eu magna
10 1319910259 Phillip Ross pross9@ehow.com Male Buzzshare VEF data-warehouse Serotonin Serotonin BioActive Nutritional Orange Weekly __ vel sem
好吧,经过一些调整后,Westley White给予了很大的帮助,我能够实现这一功能!我把它压缩成一个嵌套的函数,正在做它应该做的事情!这是代码:
#/usr/bin/python3
import csv
import time
def finder():
with open('sample_data.csv', 'r', encoding='latin-1') as csvfile:
reader = csv.DictReader(csvfile)
def dates(reader):
# Set up variables
date_range = []
sentence = []
# Initiate iteration through CSV
for row in reader:
createdOn = float(row['created_at'])
words = str(row['word'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
if d >= '2014-06-22' and d <= '2014-07-22':
date_range.append(d)
date_range.sort()
for word in words:
if d in date_range:
sentence.append(word)
print(sentence)
dates(reader)
finder()
只剩下一个问题了。附加sentence[]
后,它会一次附加一个字符。我不知道如何将字母组合成CSV列中的单词而不将它们组合在一起。有什么想法吗?
谢谢!
答案 0 :(得分:2)
我不知道数据是如何格式化的,但这是我的尝试。
导入时间
def finder(start_date='2014-06-22', end_date='2014-07-22'):
"""
:param start_date: Starting date
:param end_date: Ending date
"""
def filterDates(reader):
datelist = []
for row in reader:
created_on = float(row['created_at'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
# Is between starting and ending dates
if d >= start_date and d <= end_date:
# Going to use the created_on value so we dont have to reformat it again
datelist.append(created_on)
return datelist
def findWords(reader, datelist):
for row in reader:
if float(row['created_at']) in datelist:
words = row['word']
for word in words:
print(word)
with open('sample_data.csv', encoding="utf8") as csvfile:
reader = csv.DictReader(csvfile)
dates = filterDates(reader)
dates = dates.sort()
findWords(reader, dates)
finder('2014-06-22', '2014-07-22')
编辑: 如果要将每个单词添加到列表中,请使用此
在循环外添加
sentence_list = []
更改
words = row['word']
到
word = row['word']
然后改变
for word in words:
print(word)
到
sentence_list.append(word)
如果你想使用字符串,请在循环之外添加它
sentence = ""
然后当您打印单词时,只需将其添加到句子
# adding a Word to the sentence
sentence = "{} {}".format(sentence, word)
最后将其添加到循环外部的底部
print(sentence)