Python的新手-我正在做一些文本预处理,并试图导出csv文件,该文件包括一列由一系列字符串组成的列。
data['Lemmas']
0 [require]
1 [speak, gentleman, wk, ago]
2 [material, come, soft, plastic, st, use, pste,...
3 [send, email, confirmation]
type(data['Lemmas'][0])
list
在读取此csv时,Pandas将此列解释为一系列字符串。
0 ['require']
1 ['speak', 'gentleman', 'wk', 'ago']
2 ['material', 'come', 'soft', 'plastic', 'st', ...
3 ['send', 'email', 'confirmation']
type(data_verbatims['Lemmas'][0])
str
我已经可以通过执行一些笨拙的字符串操作来解决此问题,但是必须有一种更好的方法来正确地导出/读取此列,或者将其转换回原始结构。
import string
lemmas=[]
for words in data_verbatims['Lemmas']:
for char in words:
if char in string.punctuation:
words = words.replace(char, '')
lemmas.append(words)
lemmas = pd.Series(lemmas)
lemmas = lemmas.apply(lambda x: x.split())
答案 0 :(得分:1)
如果我对您的理解正确,我们可以在此处使用ast.literal_eval
:
Lemmas
0 ['require']
1 ['speak', 'gentleman', 'wk', 'ago']
2 ['material', 'come', 'soft', 'plastic', 'st']
3 ['send', 'email', 'confirmation']
type(df['Lemmas'][0])
#Out
str
from ast import literal_eval
df['Lemmas'] = df['Lemmas'].apply(literal_eval)
Lemmas
0 [require]
1 [speak, gentleman, wk, ago]
2 [material, come, soft, plastic, st]
3 [send, email, confirmation]
type(df['Lemmas'][0])
#Out
list