我有一个pandas数据帧,如下所示。
DF1 =
sid path
1 '["rome","is","in","province","lazio"]'
1 "['rome', 'is', 'in', 'province', 'naples']"
1 ['N']
1 "['rome', 'is', 'in', 'province', 'in', 'campania']"
....
我想删除列path
中所有不必要的字符,因此结果应如下所示:
DF2 =
sid path
1 rome is in province lazio
1 rome is in province naples
1 N
1 rome is in province in campania
....
我尝试替换所有不必要的字符:
DF1["path"].replace("[","").replace("]","").replace('"',"").replace(","," ").replace("'","")
但它没有用。我想这是由于条目["N"]
我该怎么做?任何帮助表示赞赏!
答案 0 :(得分:1)
您可以使用ast.literal_eval
安全地将列表输出作为字符串读取。记录真实列表的一种方法是捕获ValueError
。
请注意,如果可能的话,您应该尝试在问题到达数据框之前对这些问题进行上游排序。
from ast import literal_eval
df = pd.DataFrame({'sid': [1, 1, 1, 1],
'path': ['["rome","is","in","province","lazio"]',
"['rome', 'is', 'in', 'province', 'naples']",
['N'],
"['rome', 'is', 'in', 'province', 'in', 'campania']"]})
def converter(x):
try:
return ' '.join(literal_eval(x))
except ValueError:
return ' '.join(x)
df['path'] = df['path'].apply(converter)
print(df)
path sid
0 rome is in province lazio 1
1 rome is in province naples 1
2 N 1
3 rome is in province in campania 1
答案 1 :(得分:1)
使用('', '', '', '', '', '', '', '', '', '', '', '', ' Alabama', 'State', 'university')
(' Levi', 'Watkins', 'Learning', '', '', '', '', '', '', '', '', '', '', '', '')
('', '', '', '', '', '', '', '', '', '', '', '', 'Alabma', 'State', 'Unversity')
('Levi', 'Wtkins', 'Learning', '', '', '', '', '', '', '', '', '', '', '', '')
(' ETH', 'library')
('ETH', 'Library')
& ast.literal_eval
<强>演示:强>
str.join
<强>输出:强>
import pandas as pd
import ast
df = pd.DataFrame({"path": ['["rome","is","in","province","lazio"]', "['rome', 'is', 'in', 'province', 'naples']", ['N']]})
df['path'] = df['path'].astype(str).apply(ast.literal_eval).apply(lambda x: " ".join(x))
print(df)