只要单词不是元组列表中的单词,如何创建单词列表

时间:2018-08-28 07:22:41

标签: python

我有几句话:

TXT

我还有一个需要检查的元组列表:

wordlist = ['change', 'my', 'diaper', 'please']

我想做的是从不在元组列表中的所有单词中创建一个列表。

因此,此示例的结果为mylist = [('verb', 'change'), ('prep', 'my')]

我尝试过的操作似乎会创建重复项:

['diaper', 'please']

如何生成不在元组列表中的单词列表,并尽可能高效地进行?

不使用集合。

编辑:根据[word for tuple in mylist for word in wordlist if word not in tuple] 的以下限制选择答案

4 个答案:

答案 0 :(得分:2)

这是一个使用列表理解的单人纸

{
    "name": "florientr/laravel-gentelella",
    "description": "The Laravel 5.4 framework with Gentelella template",
    "keywords": [
      "framework",
      "laravel",
      "laravel 5.4",
      "gentelella",
      "laravel-gentelella",
      "template",
      "bootstrap",
      "responsive",
      "admin",
      "php",
      "html",
      "css",
      "taggable",
      "gravatar",
      "form html"
    ],
    "license": "MIT",
    "version": "4.2.0",
    "type": "project",
    "require": {
        "php": ">=5.6.4",
        "laravel/framework": "5.4.*",
        "thomaswelton/laravel-gravatar": "~1.0",
        "rtconner/laravel-tagging": "~2.2",
        "laravelcollective/html": "^5.4",
        "cartalyst/sentinel": "2.0.*",
        "laracasts/flash": "^2.0",
        "unisharp/laravel-ckeditor": "^4.6",
        "maatwebsite/excel": "~2.1.0"
    },
    "require-dev": {
        "fzaninotto/faker": "~1.4",
        "mockery/mockery": "0.9.*",
        "phpunit/phpunit": "~5.7",
        "symfony/css-selector": "3.1.*",
        "symfony/dom-crawler": "3.1.*"
    },
    "autoload": {
        "classmap": [
            "database"
        ],
        "psr-4": {
            "App\\": "app/"
        }, 
        "files": [
            "app/Helpers/helpers.php"
        ]  
    },
    "autoload-dev": {
        "classmap": [
            "tests/TestCase.php"
        ]
    },
    "scripts": {
        "post-root-package-install": [
            "php -r \"copy('.env.example', '.env');\""
        ],
        "post-create-project-cmd": [
            "php artisan key:generate"
        ],
        "post-install-cmd": [
            "Illuminate\\Foundation\\ComposerScripts::postInstall",
            "php artisan optimize"
        ],
        "post-update-cmd": [
            "Illuminate\\Foundation\\ComposerScripts::postUpdate",
            "php artisan optimize"
        ]
    },
    "config": {
        "preferred-install": "dist"
    }
}

内部列表[word for word in wordlist if word not in [ w[1] for w in mylist ]] 从元组列表中提取第二个元素。

外部列表[ w[1] for w in mylist ]提取单词,过滤掉刚提取的列表中的单词。

P.S。我以为您只想过滤元组列表的第二个元素。

答案 1 :(得分:2)

从元组列表中提取一个set个已知单词:

myList = [('verb', 'change'), ('prep', 'my')]
known_words = set(tup[1] for tup in myList)

然后像以前一样使用它:

wordlist = ['change', 'my', 'diaper', 'please']
out = [word  for word in wordlist if word not in known_words]

print(out)
# ['diaper', 'please']

检查集合中是否存在项是O(1),而检查列表或元组中的项是否是O(列表的长度),因此在这种情况下使用集合确实值得。

此外,如果您不关心单词的顺序并想删除重复项,则可以执行以下操作:

unique_new_words = set(wordlist) - known_words
print(unique_new_words)
# {'diaper', 'please'}

答案 2 :(得分:1)

这是我将您的元组展平(使用itertools.chain并与该集合进行比较的版本(使用set可以加快对{{1 }}运算符):

in

答案 3 :(得分:1)

我已经做出了一个假设,即tuple [1]只有一个元素,如果没有的话,则需要一个小的改动。

[word for word in wordlist if word not in [tuple[1] for tuple in mylist]]