从字符串计数行

时间:2013-11-27 11:24:56

标签: python count split

我需要创建一个程序来删除标点,一些特定的单词,重复并返回左边的单词和它们各自的行。我还需要跟踪重复项。例如,

Python IDLE 索引器:键入行,用a完成。仅限于行首 这是一阵轻快的吹风 来自北方,我年轻时的北方。 风也冷,比寒冷 昔日的风。 。 该指数是: 轻快1 打击1 风1,3,4 北2 青年2 感冒3 过去4

问题:我需要跟踪剩下的单词的行号以及它们的重复项。我无法做到这一点。

from string import *

stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \

              "of", "from", "here", "even", "the", "but", "and", "is", "my", \

              "them", "then", "this", "that", "than", "though", "so", "are" ]

endings = [ "es" , "ed" , "er", "ly"]

punctuation = [ ".", "," , ":" , ";" , "!" , "?" , "&" , "'" ]

unindexed_sentence = raw_input("type in lines, finish with a . at start of line only").lower()

#removing duplicates.
def unique_string(l):
    ulist = []
    ulist2 = []
    [ulist.append(x) for x in l if x not in ulist]
    [ulist2.append(x)]
    global ulist2

    return ulist
unindexed_sentence =' '.join(unique_string(unindexed_sentence.split()))

unindexed_sentence1 = split(unindexed_sentence,"\n")

list_unindexed = []



# splitting 
i = 0
while i<len(unindexed_sentence1):
    list_unindexed += [split(unindexed_sentence1[i])] 
    i+=1
countline = 0
i = 0
while i < len(list_unindexed):
    j = 0
    while j < len(list_unindexed[i]):
        if list_unindexed[i][j][0] in punctuation:
            list_unindexed[i][j] = list_unindexed[i][j][:0]
        if list_unindexed[i][j][-1] in punctuation:
            list_unindexed[i][j] = list_unindexed[i][j][:-1]
        if list_unindexed[i][j][-1] == "s":
            list_unindexed[i][j] = list_unindexed[i][j][:-1]
        if list_unindexed[i][j][-2:] in endings:
            list_unindexed[i][j] = list_unindexed[i][j][:-2]
        if list_unindexed[i][j][-3:] == "ing":
            list_unindexed[i][j] = list_unindexed[i][j][:-3]
        if list_unindexed[i][j] in stopWords:
            del list_unindexed[i][j]

        else:
            j += 1
    i += 1
    countline += 1

def new_line(n):
    split(n,"\n")
    count = 1
    if n[-1] == "\n":
        count += 1
    return count

string1 = str(list_unindexed)

string2 = str(string1)

string2 ='\n'.join(unique_string(string2.split()))   

print string2

1 个答案:

答案 0 :(得分:0)

这是你的作业吗?

这里有一些提示:

  • 不要做:from string import *。你不需要它。
  • 使用data.splitlines()获取行列表
  • 使用enumerate()获取索引,例如:for i, line in enumerate(data.splitlines())
  • 使用字典跟踪所有单词。每个值可以是一个列表或一组行号
  • 最初不要删除重复项。您可以使用词典或集合来执行此操作。