Question

我试图摆脱整个\ n熊猫数据帧中的所有\ n字符。我知道堆栈溢出已对此有答案，但由于某些原因，我无法获得所需的输出。我有以下数据框：

  title     text    date    authors
0   [ECB completes foreign reserves investment in ...   [\nThe European Central Bank (ECB) completed an ...     [13 June 2017]  ECB
1   [Measures to improve the efficiency of the ope...   [\nThe Governing Council of the ECB has decided ...     [\n 23 January 2003 \n ]    ECB
2   []  []  []  ECB
3   [ECB publishes the results of the Euro Money M...   [Today the European Central Bank (ECB) is publ...   [\n 28 September 2012 \n ]  ECB
4   []  []  []  ECB

这是我想要的输出：

title   text    date    authors
0   [ECB completes foreign reserves investment in...    [The European Central Bank (ECB) completed an ...   [13 June 2017]  ECB
1   [Measures to improve the efficiency of the ope...   [The Governing Council of the ECB has decided ...   [23 January 2003]   ECB
2   []  []  []  ECB
3   [ECB publishes the results of the Euro Money M...   [Today the European Central Bank (ECB) is publ...   [28 September 2012]     ECB
4   []  []  []  ECB

这些都是我尝试过的代码：

基于我尝试过的this stack overflow帖子：

mydf=df.replace({r'\\n': ''}, regex=True)

mydf=df['date'].str.strip(r'\\n') #this turns every obs into NaN 

mydf=df.replace(to_replace=[r"\\n", "\n"], value=["",""], regex=True, inplace =True) #this gets rid of all data in dataframe for some reason

两者都不起作用

基于我尝试过的this post（请注意，我跳过了之前已经尝试过的答案）：

mydf=df.replace(r'\s', '', regex = True, inplace = True) #this deleted all data
基于this post我尝试过：

mydf=df.replace('\\n',' ')
基于对我尝试过的this post的评论：

mydf=df['date'].replace(r'\s+|\\n', ' ', regex=True, inplace=True) 和

mydf=df.replace(r'\s+|\\n', ' ', regex=True, inplace=True)
基于我尝试的this post中的答案：

mydf= df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n', ' ', regex=True)

mydf=df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True) # this again deleted whole df

我不明白为什么在我的案例中发现的答案为什么不起作用，因为它们被接受了，而且大多数问题似乎与我的非常相似。

Answer 1

尝试：

df['date']=df['date'].str[0].str.replace(r"\n", "")

这是基于以下假设：date列中的每个单元格都是只有1个元素的列表。它也会使它变平-因此您将从单个元素中获取字符串。

但是，如果date可以包含多个元素，并且您希望在摆脱所有\n之后将它们全部合并为一个字符串-请尝试

df['date']=df['date'].str.join('').str.replace(r"\n", "")

否则，如果您希望将其保留为列表格式，则只需剥离\n的所有元素，请尝试（&&是临时分隔符）：

df['date']=df['date'].str.join(r'&&').str.replace(r"\n", "").str.split(r'&&')

Answer 2

badgeText.data

从熊猫数据框中删除\ n的问题

2 个答案: