删除列之间的额外空格

时间:2018-01-15 08:29:34

标签: python pandas

我得到了以下输出:

  

体育(6个位置)穆里尼奥热衷于加强长期的合作交易

     意见(5个空间)朝鲜作为核大国的现实

当我写一个.txt文件时,如何让它们变成运动(1个空格)......和意见(1个空格)......

这是我的代码:

the_frame = pdsql.read_sql_query("SELECT category, title FROM training;", conn)
pd.set_option('display.max_colwidth', -1)
print(the_frame)
the_frame = the_frame.replace('\s+', ' ', regex=True)#tried to remove multiple spaces
base_filename = 'Values.txt'
with open(os.path.join(base_filename),'w') as outfile:
    df = pd.DataFrame(the_frame)
    df.to_string(outfile, index=False, header=False)

2 个答案:

答案 0 :(得分:1)

我认为你的解决方案很好,只应简化:

还测试了多个标签,它也很好用。

the_frame = pdsql.read_sql_query("SELECT category, title FROM training;", conn)
the_frame = the_frame.replace('\s+', ' ', regex=True)
base_filename = 'Values.txt'
the_frame.to_csv(base_filename, index=False, header=False)

<强>示例

the_frame = pd.DataFrame({
    'A': ['sports      mourinho keen to tie up long-term de gea deal',
          'opinion     the reality of north korea as a nuclear power'],
    'B': list(range(2))
})
print (the_frame)
                                                   A  B
0  sports      mourinho keen to tie up long-term ...  0
1  opinion     the reality of north korea as a nu...  1

the_frame = the_frame.replace('\s+', ' ', regex=True)
print (the_frame)
                                                   A  B
0  sports mourinho keen to tie up long-term de ge...  0
1  opinion the reality of north korea as a nuclea...  1

编辑:我认为您需要将两个列与空格连接,并将输出写入file而不使用sep参数。

the_frame = pd.DataFrame({'category': {0: 'sports', 1: 'sports', 2: 'opinion', 3: 'opinion', 4: 'opinion'}, 'title': {0: 'mourinho keen to tie up long-term de gea deal', 1: 'suarez fires barcelona nine clear in sociedad fightback', 2: 'the reality of north korea as a nuclear power', 3: 'the real fire fury', 4: 'opposition and dr mahathir'}} )
print (the_frame)
  category                                              title
0   sports      mourinho keen to tie up long-term de gea deal
1   sports  suarez fires barcelona nine clear in sociedad ...
2  opinion      the reality of north korea as a nuclear power
3  opinion                                 the real fire fury
4  opinion                         opposition and dr mahathir

the_frame = the_frame['category'] + ' ' + the_frame['title']
print (the_frame)
0    sports mourinho keen to tie up long-term de ge...
1    sports suarez fires barcelona nine clear in so...
2    opinion the reality of north korea as a nuclea...
3                           opinion the real fire fury
4                   opinion opposition and dr mahathir
dtype: object

base_filename = 'Values.txt'
the_frame.to_csv(base_filename, index=False, header=False)

答案 1 :(得分:0)

您可以尝试以下操作而不是

the_frame = the_frame.replace('\s+', ' ', regex=True)
#use the below syntax


the_frame = the_frame.str.replace('\s+', ' ', regex=True)# this will remove multiple whitespaces .