如何阅读中文文本并将中文字符写入csv-Python 3

时间:2018-11-14 22:07:43

标签: python

我已经搜索过SO,但无法找到此特定问题的答案。我正在尝试从汉字.txt文件中读取内容。当我尝试写入.csv时,单元格的内容如下所示:

b'\ xef \ xbb \ xbf \ xe5'

相对于:

山西襄汾

如何将后一种格式输出为.csv?相关代码段如下:

################################################################################
#
# License:.....GNU General Public License v3.0
# Author:......CodeMonkey
# Date:........14 November 2018
# Title:.......GitMavenCleanInstall.sh
# Description: This script is designed to cd to a set Maven POM Project,
#   perform a git remote update and pull, and clean install the changed
#   files projects.
# Notice:......The project structure this script was originally set to target
#   is structured as a Maven POM Project that contains several sub-POM Projects.
#   The sub-POM Projects contain Maven Java Application projects. The targets
#   should be easy to change, and allow for others to target other structures.
#
################################################################################
#
# Change History: N/A
#
################################################################################

#!/bin/bash
#Function to check if array has element
containsElement () {
    local e match="$1"
    shift
    for e; do [[ "$e" == "$match" ]] && return 0; done
    return 1
}

#Navigate to the POM Project
cd PATH/TO/POM/PROJECT
#Remote update
git remote update -p
#Pull
git pull

#Get the current working branch
CURRENT_BRANCH="$(git branch | sed -n -e 's/^\* \(.*\)/\1/p')"
#Get the output of the command git diff
GIT_DIFF_OUTPUT="$(git diff --name-status HEAD@{1} ${CURRENT_BRANCH})"

#Split the diff output into an array
read =a GIT_DIFF_OUTPUT_ARY <<< $GIT_DIF_OUTPUT
#Declare empty array for root path
declare -a GIT_DIFF_OUTPUT_ARY_ROOT_PATH=()
FORWARD='/'
#Loop diff output array
for i in "$GIT_DIFF_OUTPUT_ARY[@]}"
do
    #Check that the string is not 1 Character
    if [[ "$(echo -n $1 | wc -m)" != 1 ]]
    then
        #Split the file path by /
        IFS='/' read -ra SPLIT <<< $i
        #Concatenate first path + / + second path
        path=${SPLIT[0]}$FORWARD${SPLIT[1]}
        #Call function to see if it already exists in the root path array
        containsElement "$path" "${GIT_DIFF_OUTPUT_ARY_ROOT_PATH[@]}"
        if [[ $? != 0 ]]
        then
            #Add the path since it was not found
            GIT_DIFF_OUTPUT_ARY_ROOT_PATH+=($path)
        fi
    fi
done

#Loop root path array
for val in ${GIT_DIFF_OUTPUT_ARY_ROOT_PATH[@]}
do
    #CD into root path
    cd $val
    #Maven call to clean install
    mvn -DskipTests=true --errors -T 8 -e clean install
    #CD back up before next project
    cd ../../
done

1 个答案:

答案 0 :(得分:0)

首先,请确保按照Peter Wood的建议用outfilehandle创建encoding='utf-8',如下所示:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

然后无需调用date.encode("utf-8-sig"),只需将代码段中的7-8行更改为:

localrow.append(date)
localrow.append(title)

另外,阅读Python Unicode HOWTOProcessing Text Files in Python 3可能会有所帮助。