Question

我正在尝试删除所有不属于法语的短语。我尝试使用langdetect库（不幸的是没有熊猫）

CSV文件

message
Je suis fatiguée
The book is on the table
Il fait chaud aujourd'hui!
They are sicks
La vie est belle

脚本：

import csv
from langdetect import detect

with open('ddd.csv', 'r') as file:
    fichier = csv.reader(file)

    for line in fichier:
        if line[0] != '':
            message = line[0]

            def detecteur_FR(message):
                #We need to turn the column into a list of lists.
                message_list = [comments for comments in message.split('\n')]
                for text in message_list:
                    if detect(text) == 'fr':
                        message_FR = text
                        return message_FR

            print(detecteur_FR(message))

我的输出：

None
Je suis fatiguée
None
Il fait chaud aujourd hui!
None
La vie est belle

我想要：

Je suis fatiguée
Il fait chaud aujourd hui!
La vie est belle

如何删除“无”？

Answer 1

您只需在打印前添加支票：

result = detecteur_FR(message)
if result is not None:
    print(result)

Answer 2

您正在循环的每个迭代步骤中重新定义函数。

相反，（在全局范围内）定义一次，仅在循环内调用它：

import csv
from langdetect import detect

def detecteur_FR(message):
    # We need to turn the column into a list of lists.
    for text in message.split('\n'):
        if detect(text) == 'fr':
            return text

with open('ddd.csv', 'r') as file:
    for line in csv.reader(file):
        if line[0] != '':
            result = detecteur_FR(line[0])
            if result:
                 print(result)

Answer 3

您可以在打印消息之前进行比较吗？

"dependencies": {
   "array-shuffle": "^1.0.1",
   "electron-is-dev": "^0.3.0",
   "enzyme": "^2.9.1",
   "history": "^4.7.2",
   "node-sass-chokidar": "^0.0.3",
   "npm-run-all": "^4.0.2",
   "opn": "^5.3.0",
   "prop-types": "^15.6.0",
   "react": "^15.6.1",
   "react-dom": "^15.6.1",
   "react-markdown": "^3.3.0",
   "react-redux": "^5.0.6",
   "react-router": "^4.2.0",
   "react-router-redux": "^5.0.0-alpha.6",
   "react-scripts": "^1.0.10",
   "react-showdown": "^1.6.0",
   "react-test-renderer": "^15.6.1",
   "react-transition-group": "^2.2.1",
   "react-unity-webgl": "^6.5.0",
   "redux": "^3.7.2",
   "redux-devtools-extension": "^2.13.2",
   "redux-logger": "^3.0.6",
   "redux-thunk": "^2.2.0"
},
"devDependencies": {
   "ajv": "^6.5.2",
   "concurrently": "^3.5.1",
   "cz-conventional-changelog": "^2.0.0",
   "electron": "2.0.3",
   "electron-builder": "^20.24.2",
   "env-cmd": "^8.0.2",
   "eslint": "^4.19.1",
   "eslint-config-airbnb": "^16.1.0",
   "eslint-plugin-import": "^2.13.0",
   "eslint-plugin-jsx-a11y": "^6.1.1",
   "eslint-plugin-react": "^7.7.0",
   "jest-cli": "^20.0.4",
   "wait-on": "^2.1.0"
 },
 "scripts": {
   "build": "npm run build-css && react-scripts build",
   "build:dev": "PUBLIC_URL=/ap npm run build",
   "build-css": "node-sass-chokidar src/ -o src/",
   "electron-dev": "concurrently \"BROWSER=none npm start\" \"wait-on http://localhost:3000 && electron .\"",
   "electron-pack": "build --linux --mac --em.main=build/electron.js",
   "electron-pack-win": "env-cmd .env.deploy electron-builder --win",
   "electron-pack-windir": "build --win --dir --em.main=build/electron.js",
   "preelectron-pack": "npm run build",
   "commitmsg": "validate-commit-msg",
   "eject": "react-scripts eject",
   "start": "npm-run-all -p watch-css start-js",
   "start-js": "react-scripts start",
   "test": "react-scripts test --env=jsdom",
   "watch-css": "npm run build-css && node-sass-chokidar src/ -o src/ --watch --recursive"
 },

Answer 4

我认为您得到了None，因为您没有去除每行末尾的'\ n'

尝试一下：

import csv
from langdetect import detect

def detecteur_FR(message):
     #We need to turn the column into a list of lists.
     message_list = [comments for comments in message.split('\n')]
     for text in message_list:
         if detect(text) == 'fr':
              message_FR = text
              print message_FR   

with open('ddd.csv', 'r') as file:
    fichier = csv.reader(file)

    for line in fichier:
        if line.strip() != '':
            message = line[0]
            detecteur_FR(message)

从输出中删除“无”

4 个答案: