保存一定数量的找到的单词

时间:2018-04-16 14:33:13

标签: python python-3.x

我一直在做这个checkio任务而且我被卡住了。 我得到了一些单词和一个字符串后跟文本的列表。任务是我需要计算文本列表中的单词数量,并创建一个字典,其中键是一个单词,值是金额。它必须如下所示:

#Create Isochrone points
iso1 <- osrmIsochrone(loc = c(-2.3827439,53.425705), breaks = seq(from = 0, to = 60, by = 5))
iso2 <- osrmIsochrone(loc = c(-0.85074928,51.325871), breaks = seq(from = 0, to = 60, by = 5)) 
iso3 <- osrmIsochrone(loc = c(-2.939367,51.570344), breaks = seq(from = 0, to = 60, by = 5)) 
iso4 <- osrmIsochrone(loc = c(-3.9868026,55.823102), breaks = seq(from = 0, to = 60, by = 5)) 
iso5 <- osrmIsochrone(loc = c(-0.92104073,53.709006), breaks = seq(from = 0, to = 60, by = 5))

iso <- rbind(iso1, iso2,iso3,iso4,iso5)

#Create Drive Time Interval descriptions
iso@data$drive_times <- factor(paste(iso@data$min, "to", iso@data$max, "mins"))

#Create Colour Palette for each time interval
factPal <- colorFactor(rev(heat.colors(12)), iso@data$drive_times)

#Draw Map
leaflet()%>%
  addProviderTiles("CartoDB.Positron", group="Greyscale")%>%
  # addMarkers(data=spatialdf,lng=spatialdf$Longitude, lat=spatialdf$Latitude, popup = htmlEscape(~`Locate`))%>%
  addPolygons(fill = TRUE, stroke = TRUE, color = "black",fillColor = ~factPal(iso@data$drive_times), weight = 0.5, fillOpacity = 0.2, data=iso, popup = iso@data$drive_times, group = "Drive Time") %>%
addLegend("bottomright", pal = factPal, values = iso@data$drive_times, title = "Drive Time")  

和输出:

popular_words('''

When I was One

I had just begun

When I was Two

I was nearly new

''', ['i', 'was', 'three', 'near'])

我做了几乎所有事情,但我不知道如何在这个不在文本中的词典中添加一个单词(比如'near':3)

这就是我得到的:

{

'i': 4,

'was': 3,

'three': 0,

'near': 0

}

我试过了:

result = {}
number = 0
list1 = []

words = '''

When I was One

I had just begun

When I was Two

I was nearly new

'''

check = ['i', 'was', 'three', 'near']
a = list(words.split())

for word in a:
    if word.lower() in check:
        wc = words.count(word)
        result[word] = wc


print(result)

但它不起作用:(

5 个答案:

答案 0 :(得分:4)

试试这个,它应该比自己迭代更快。 Counter类具有处理文本中未找到的单词的额外好处。

from collections import Counter
result = {k: Counter(words.lower().split())[k] for k in check}

答案 1 :(得分:3)

首先,word.split()已经为您提供了一个列表。无需再将其转换为列表。

其次,你应该做的是迭代检查(你只关心检查中的单词),而不是通过。

第三,你可以使用词典理解来加快速度:

a = words.lower().split()
results = {word:a.count(word) for word in check}

遍历代码,words.split()与list(words.split())相同 用于执行以下操作的pythonic(和更快)方式的代码:

results = {}
for word in check:
    results[word] = a.count(word)

答案 2 :(得分:2)

from collections import Counter

words = '''

When I was One

I had just begun

When I was Two

I was nearly new

'''

check = ['i', 'was', 'three', 'near']

words = Counter(words.split())
newcounter = {}


for i in words:
    if i.lower() in check:
        newcounter[i.lower()] = words[i]


print(newcounter)

答案 3 :(得分:1)

您的代码效率不高,因为您逐字循环遍历整个文本,但随后使用.count(word)循环遍历整个文本。您可能希望增加一个计数器(恰好是您的字典),如下所示:

if word.lower() in check:
    if word in result:
       word[result] += 1
    else:
       word[result] = 1

。 循环浏览文本后,您可以循环浏览常用词汇列表,并将其添加到词典中,如果它们不在:

if not pop_word in result:
   result[pop] = 0

答案 4 :(得分:0)

您应该从给定的有趣单词列表中初始化结果字典,然后在每次找到单词时增加计数:

def popular_words(txt, wordlist):
    result = { w.lower():0 for w in wordlist }  # initialize count to 0
    for word in txt.split():
        w = word.lower()                        # only considere lower case
        if w in result:
            result[w] += 1                      # increase for each occurence
    return result