Question

我正在尝试搜索文件中的单词。这些单词存储在单独的列表中。找到的单词存储在另一个列表中，最后返回该列表。

代码如下：

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1:
    for line in file1:
        for word in line.split():
            matching = [s for s in qualities if word.lower() in s]
            if matching is not None:
                education.append(matching)
return education

首先它返回一个包含大量空座位的列表，这意味着我的比较不起作用？

结果（扫描4个文件）：

"C:\Program Files (x86)\Python2\python.exe" C:/Users/Vadim/PycharmProjects/TestFiles/ReadTXT.py
[[], [], [], [], [], [], [], [], [], ['java', 'javascript']]
[[], [], [], [], [], [], [], [], [], ['pascal']]
[[], [], [], [], [], [], [], [], [], ['linux']]
[[], [], [], [], [], [], [], [], [], [], ['c#']]

Process finished with exit code 0

输入文件包含：

Name: Some Name
Phone: 1234567890
email: some@email.com
python,excel,linux

第二个问题每个文件包含3种不同的技能，但是函数只找到1或2.这也是一个不好的比较，或者我在这里有不同的错误？

我希望结果只是找到没有空位的技能列表，并找到文件中的所有技能，而不仅仅是其中的一些。

修改：该功能确实可以在word.split(', ')找到所有技能但如果我希望它更具普遍性，如果我不确切知道将它们分开的话，那么找到这些技能的好方法呢？

Answer 1

您获得空列表，因为None不等于空列表。您可能想要的是将条件更改为以下内容：

if matching:
    # do your stuff

您似乎正在检查质量列表中的字符串中是否存在子字符串。这可能不是你想要的。如果要检查质量列表中显示的行上的单词，可能需要将列表解析更改为：

words = line.split()
match = [word for word in words if word.lower() in qualities]

如果您正在考虑匹配,和空格，您可能需要查看正则表达式。请参阅Split Strings with Multiple Delimiters?。

Answer 2

代码应该写成如下（如果我正确理解了所需的输出格式）：

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1:
    for line in file1:
        matching = []
        for word.lower() in line.strip().split(","):
            if word in qualities:
                matching.append(word)
        if len(matching) != 0:
            education.append(matching)
return education

Answer 3

首先，你得到一堆“空座位”，因为你的情况没有正确定义。如果匹配是一个空列表，则它不是None。即：[] is not None评估为True。这就是为什么你得到所有这些“空座位”。

首先，列表理解中的条件也不是你想要的。除非我在这里误解了你的目标，否则你正在寻找的条件是：

[s for s in qualities if word.lower() == s]

这将检查质量列表，并且只有在单词属于其中一个质量时才会返回非空的列表。但是，由于此列表的长度始终为1（如果匹配）或0（如果没有），我们可以使用python的内置any()函数将其交换为布尔值：< / p>

if any(s == word.lower() for s in qualities):
    education.append(word)

我希望这有帮助，如果您有或者告诉我我是否误解了您的目标，请随时提出任何后续问题。

为了您的回忆，以下是我用来检查自己的修改过的来源：

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open(file, 'r') as file1:
        for line in file1:
            for word in line.split():
                if any(s == word.lower() for s in qualities):
                    education.append(word)
    return education

Answer 4

你也可以使用这样的正则表达式：

def scan_education(file_name):
    education = []
    qualities_list = ["python", "java", "sql", "mysql", "sqlite", "c\#", "c\+\+", "c", "javascript", "pascal",
                      "html", "css", "jquery", "linux", "windows"]
    qualities = re.compile(r'\b(?:%s)\b' % '|'.join(qualities_list))
    for line in open(file_name, 'r'):
        education += re.findall(qualities, line.lower())
    return list(set(education))

Answer 5

这是一个使用集合和一些列表理解过滤的简短示例，用于查找文本文件（或者我只使用文本字符串）和您提供的列表之间的常用单词。这比尝试使用循环更快，更清晰。

import string

try:
    with open('myfile.txt') as f:
        text = f.read()
except:
    text = "harry met sally; the boys went to the park.  my friend is purple?"

my_words = set(("harry", "george", "phil", "green", "purple", "blue"))

text = ''.join(x for x in text if x in string.ascii_letters or x in string.whitespace)

text = set(text.split()) # split on any whitespace

common_words = my_words & text # my_words.intersection(text) also does the same

print common_words

在文件中搜索列表中的单词

5 个答案: