在每个单词前添加Virgula

时间:2017-10-11 20:57:29

标签: python string pandas csv text

我有一个超过一千行的文本文件,对于某个过程,我需要用逗号分隔单词。我想要帮助在python中开发这个算法,因为我从语言

开始

ENTRADA

input phrase of the file to exemplify

赛达

input, phrase, of, the, file, to, exemplify

我这样想:

import pandas as pd

 sampletxt = pd.read_csv('teste.csv' , header = None)
 output = sampletxt.replace(" ", ", ")

 print output

5 个答案:

答案 0 :(得分:3)

dog_file = open("Dogs.txt", "r")
dogs = dog_file.readlines()
# you want to strip away the spaces and new line characters
content = [x.strip() for x in dogs]
data = input("Enter a name: ")
# since dogs here is a list
if data in dogs:
    print("Success")
else:
    print("Sorry that didn't work")

答案 1 :(得分:3)

你的行可能只是一个字符串,所以你可以使用:

line.replace(" ",", ")

答案 2 :(得分:1)

复杂性你应该用逗号直接替换空格,而不是多次遍历短语。

the_list = entrada.replace(' ', ', ')

答案 3 :(得分:1)

首先,您需要read your input on line at a time。 然后你只需使用str.replace():

sampletxt = "input phrase of the file to exemplify"
output = sampletxt.replace(" ", ", ")

你已经完成了。

答案 4 :(得分:1)

根据您添加的代码示例,您尝试回答的问题是如何将' '替换为', 'pandas dataframe的每一行。

这是一种方法:

import pandas as pd

sampletxt = pd.read_csv('teste.csv' , header = None)
output = sampletxt.replace('\s+', ', ', regex=True)
print(output)

示例:

In [24]: l
Out[24]: 
['input phrase of the file to exemplify',
 'input phrase of the file to exemplify 2',
 'input phrase of the file to exemplify 4']

In [25]: sampletxt = pd.DataFrame(l)

In [26]: sampletxt
Out[26]: 
                                         0
0    input phrase of the file to exemplify
1  input phrase of the file to exemplify 2
2  input phrase of the file to exemplify 4

In [27]: output = sampletxt.replace('\s+', ', ', regex=True)

In [28]: output 
Out[28]: 
                                                0
0     input, phrase, of, the, file, to, exemplify
1  input, phrase, of, the, file, to, exemplify, 2
2  input, phrase, of, the, file, to, exemplify, 4

OLD回答

您也可以使用re.sub(..),如下所示:

In [3]: import re

In [4]: st = "input phrase of the file to exemplify"

In [5]: re.sub(' ',', ', st)
Out[5]: 'input, phrase, of, the, file, to, exemplify'

re.sub(...)str.replace(..)

In [6]: timeit re.sub(' ',', ', st)
100000 loops, best of 3: 1.74 µs per loop

In [7]: timeit st.replace(' ',', ')
1000000 loops, best of 3: 257 ns per loop

如果您有多个空格分隔两个单词,则基于str.replace(' ',',')的所有答案的输出都将是错误的。例如

In [15]: st
Out[15]: 'input phrase of the file to  exemplify'

In [16]: re.sub(' ',', ', st)
Out[16]: 'input, phrase, of, the, file, to, , exemplify'

In [17]: st.replace(' ',', ')
Out[17]: 'input, phrase, of, the, file, to, , exemplify'

要解决此问题,您需要使用匹配一个或多个空格的正则表达式(正则表达式),如下所示:

In [22]: st
Out[22]: 'input phrase of the file to  exemplify'

In [23]: re.sub('\s+', ', ', st)
Out[23]: 'input, phrase, of, the, file, to, exemplify'