我正在尝试删除所有不属于法语的短语。我尝试使用langdetect库(不幸的是没有熊猫)
CSV文件
message
Je suis fatiguée
The book is on the table
Il fait chaud aujourd'hui!
They are sicks
La vie est belle
脚本:
import csv
from langdetect import detect
with open('ddd.csv', 'r') as file:
fichier = csv.reader(file)
for line in fichier:
if line[0] != '':
message = line[0]
def detecteur_FR(message):
#We need to turn the column into a list of lists.
message_list = [comments for comments in message.split('\n')]
for text in message_list:
if detect(text) == 'fr':
message_FR = text
return message_FR
print(detecteur_FR(message))
我的输出:
None
Je suis fatiguée
None
Il fait chaud aujourd hui!
None
La vie est belle
我想要:
Je suis fatiguée
Il fait chaud aujourd hui!
La vie est belle
如何删除“无”?
答案 0 :(得分:5)
您只需在打印前添加支票:
result = detecteur_FR(message)
if result is not None:
print(result)
答案 1 :(得分:2)
您正在循环的每个迭代步骤中重新定义函数。
相反,(在全局范围内)定义一次,仅在循环内调用它:
import csv
from langdetect import detect
def detecteur_FR(message):
# We need to turn the column into a list of lists.
for text in message.split('\n'):
if detect(text) == 'fr':
return text
with open('ddd.csv', 'r') as file:
for line in csv.reader(file):
if line[0] != '':
result = detecteur_FR(line[0])
if result:
print(result)
答案 2 :(得分:2)
您可以在打印消息之前进行比较吗?
"dependencies": {
"array-shuffle": "^1.0.1",
"electron-is-dev": "^0.3.0",
"enzyme": "^2.9.1",
"history": "^4.7.2",
"node-sass-chokidar": "^0.0.3",
"npm-run-all": "^4.0.2",
"opn": "^5.3.0",
"prop-types": "^15.6.0",
"react": "^15.6.1",
"react-dom": "^15.6.1",
"react-markdown": "^3.3.0",
"react-redux": "^5.0.6",
"react-router": "^4.2.0",
"react-router-redux": "^5.0.0-alpha.6",
"react-scripts": "^1.0.10",
"react-showdown": "^1.6.0",
"react-test-renderer": "^15.6.1",
"react-transition-group": "^2.2.1",
"react-unity-webgl": "^6.5.0",
"redux": "^3.7.2",
"redux-devtools-extension": "^2.13.2",
"redux-logger": "^3.0.6",
"redux-thunk": "^2.2.0"
},
"devDependencies": {
"ajv": "^6.5.2",
"concurrently": "^3.5.1",
"cz-conventional-changelog": "^2.0.0",
"electron": "2.0.3",
"electron-builder": "^20.24.2",
"env-cmd": "^8.0.2",
"eslint": "^4.19.1",
"eslint-config-airbnb": "^16.1.0",
"eslint-plugin-import": "^2.13.0",
"eslint-plugin-jsx-a11y": "^6.1.1",
"eslint-plugin-react": "^7.7.0",
"jest-cli": "^20.0.4",
"wait-on": "^2.1.0"
},
"scripts": {
"build": "npm run build-css && react-scripts build",
"build:dev": "PUBLIC_URL=/ap npm run build",
"build-css": "node-sass-chokidar src/ -o src/",
"electron-dev": "concurrently \"BROWSER=none npm start\" \"wait-on http://localhost:3000 && electron .\"",
"electron-pack": "build --linux --mac --em.main=build/electron.js",
"electron-pack-win": "env-cmd .env.deploy electron-builder --win",
"electron-pack-windir": "build --win --dir --em.main=build/electron.js",
"preelectron-pack": "npm run build",
"commitmsg": "validate-commit-msg",
"eject": "react-scripts eject",
"start": "npm-run-all -p watch-css start-js",
"start-js": "react-scripts start",
"test": "react-scripts test --env=jsdom",
"watch-css": "npm run build-css && node-sass-chokidar src/ -o src/ --watch --recursive"
},
答案 3 :(得分:1)
我认为您得到了None,因为您没有去除每行末尾的'\ n'
尝试一下:
import csv
from langdetect import detect
def detecteur_FR(message):
#We need to turn the column into a list of lists.
message_list = [comments for comments in message.split('\n')]
for text in message_list:
if detect(text) == 'fr':
message_FR = text
print message_FR
with open('ddd.csv', 'r') as file:
fichier = csv.reader(file)
for line in fichier:
if line.strip() != '':
message = line[0]
detecteur_FR(message)