Question

我有一个excel表，其中包含许多软件名称，如Visual Studio 2012，Visual Studio 2013，Visual Studio 2017，Adobe Reader英语，Adobe Reader Deutsche，Power shell 4.0，Power shell 2.0，Power Shell 5.0。

我想只获得一个相关的软件版本名称。例如，在这种情况下，我希望我的输出是Visual Studio 2013，Power shell 4.0，Adobe Reader英语，剩下的就剩下了。我正在使用Python NLP。我删除了所有垃圾字符和版本号，但我不确定如何继续进行。

任何进一步构建的想法？在获得两个没有任何数字和垃圾字符的软件名称后，我尝试了序列匹配，但结果并不准确和有效。

import pandas as pd
from nltk.tokenize import wordpunct_tokenize

df = pd.read_csv('C:\\Users\\533471\\Desktop\\Book2.csv', encoding='Windows-1252')
saved_column = df.RowLabels[:]
second_column = df.RowLabels[:]

print(saved_column)

for eachcol in saved_column:
    eachword = eachcol.split()
    print(eachword)

    for secondcol in second_column:
        sentence = None
        wordo = None
        punct = None

        x = []
        copy = []
        secondword = secondcol.split()[:]

        ####proceed only if the first word is equal
        if eachword[0] in secondword[0]:
            print("true")
            sentence = eachword[:]
            sentence += secondword

            ####splitting according to punctuations.
            for token in sentence:
                word = wordpunct_tokenize(token)

                if wordo is None:
                    wordo = word
                else:
                    wordo += word

            ####Removing all the punctuations.
            punct = [item for item in wordo if item.isalpha()]
            t = punct[:]
            t.reverse()

            for p in punct:
                print(p)
                if len(x) > 0:
                    print(x, "Appended")
                    a = str(p)
                    x += [p]
                    if p == x[0]:
                        break
                else:
                    print("list is empty")

                    x += [p]

            x.pop()
            for z in t:
                print(z)
                if len(copy) > 0:
                    print(copy, "appended")

                    copy += [z]
                    if z == punct[0]:
                        break
                else:
                    print("list is empty")
                    copy += [z]

                print(copy)

        else:
            print("false")

Python获取相关的软件名称

0 个答案: