从熊猫中的数据替换/剥离某些文本?

时间:2014-06-28 21:13:30

标签: python python-2.7 replace pandas strip

我遇到了Pandas没有正确替换某些文本的问题......

# Create blank column
csvdata["CTemp"] = ""
# Create a copy of the data in "CDPure"
dcol = csvdata.CDPure
# Fill "CTemp" with the data from "CDPure" and replace and/or remove certain parts
csvdata['CTemp'] = dcol.str.replace(" (AMI)", "").replace(" N/A", "Non")

但是当我打印时,它没有通过运行print csvdata[-50:].head(50)

替换下面的任何内容
         Pole     KI   DE    Score   STAT  CTemp
4429      NaN      NaN  NaN      42    NaN  Data N/A 
4430      NaN      NaN  NaN   23.43    NaN  Data (AMI)
4431      NaN      NaN  NaN    7.05    NaN  Data (AMI)
4432      NaN      NaN  NaN    9.78    NaN  Data 
4433      NaN      NaN  NaN  169.68    NaN  Data (AMI)
4434      NaN      NaN  NaN   26.29    NaN  Data N/A
4435      NaN      NaN  NaN   83.11    NaN  Data  N/A

注意:CSV相当大,因此我必须使用pandas.set_option('display.max_columns', 250)才能打印上述内容。

任何人都知道如何让它在熊猫中正确替换这些部分?

编辑,我已尝试.str.replace("", "")并尝试了.replace("", "")

示例CSV:

No,CDPure,Blank
1,Data Test,
2,Test N/A,
3,Data N/A,
4,Test Data,
5,Bla,
5,Stack,
6,Over (AMI),
7,Flow (AMI),
8,Test (AMI),
9,Data,
10,Ryflex (AMI),

示例代码:

# Import pandas
import pandas

# Open csv (I have to keep it all as dtype object otherwise I can't do the rest of my script)
csvdata = pandas.read_csv('test.csv', dtype=object)

# Create blank column
csvdata["CTemp"] = ""
# Create a copy of the data in "CDPure"
dcol = csvdata.CDPure
# Fill "CTemp" with the data from "CDPure" and replace and/or remove certain parts
csvdata['CTemp'] = dcol.str.replace(" (AMI)", "").str.replace(" N/A", " Non")

# Print
print csvdata.head(11)

输出:

    No        CDPure Blank         CTemp
0    1     Data Test   NaN     Data Test
1    2      Test N/A   NaN      Test Non
2    3      Data N/A   NaN      Data Non
3    4     Test Data   NaN     Test Data
4    5           Bla   NaN           Bla
5    5         Stack   NaN         Stack
6    6    Over (AMI)   NaN    Over (AMI)
7    7    Flow (AMI)   NaN    Flow (AMI)
8    8    Test (AMI)   NaN    Test (AMI)
9    9          Data   NaN          Data
10  10  Ryflex (AMI)   NaN  Ryflex (AMI)

1 个答案:

答案 0 :(得分:2)

str.replace将其参数解释为正则表达式,因此您需要使用dcol.str.replace(r" \(AMI\)", "").str.replace(" N/A", "Non")来转义括号。

这似乎没有充分记录; the docs提到splitreplace"也使用正则表达式"但是并没有明确表示他们总是将他们的论点解释为正则表达式