Question

我对python＆amp;大熊猫，有一个问题。我有一系列45398字符串，我需要编辑。我从excel文件导入它们。

import pandas as pd
import numpy as np
import xlrd

file_location = "#mypath/leistungen_2017.xlsx"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)`

df = pd.read_excel("leistungen_2017.xlsx")

以下是前几行，就像示例一样。

>>> df
Leistungserbringer  Anzahl  Leistung    Code    Rechnungsnummer
0   Albert  1   15.0160 Vollständige Spirometrie und Resistanc...   1   8957
1   Albert  1   15.0200 CO-Diffusion, jede Methode  1   8957
2   Albert  1   15.0285 Messung ausgeatmetes Stickstoffmonoxid...   1   8957
3   Albert  1   AMC-30864 Spirometriefilter mit Mundstück   1   8957
4   Albert  1   5889797 RELVAR ELLIPTA Inh Plv 92mcg/22mcg 30 Dos   1   8957
5   Albert  1   00.0010 Konsultation, erste 5 Min. (Grundkonsu...   1   8957

在第四列中，文本前面有一堆数字，我想删除整个系列。

我使用单个字符串进行了测试，它可以正常使用：

>>> str("15.0200 CO-Diffusion, jede Methode".split(' ', 1)[1:]).strip('[]')`
"'CO-Diffusion, jede Methode'"

我尝试将此应用于整个系列：

for entry in df.Leistung:
    df.Leistung.replace({entry : str(entry.split(' ', 1)[1:]).strip('[]')},  inplace=True)

df.Leistung的结果应该是这样的：

0        Vollständige Spirometrie und Resistance (Plet...
1                             CO-Diffusion, jede Methode
2         Messung ausgeatmetes Stickstoffmonoxid ({eNO})
3                        Spirometriefilter mit Mundstück
4              RELVAR ELLIPTA Inh Plv 92mcg/22mcg 30 Dos
5         Konsultation, erste 5 Min. (Grundkonsultation)

相反，我收到了这个：

一行给出了这个：

45384    'Dos\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'"\\\\\\\\\...

我需要在同一列中使用新系列更新旧系列。我希望这是可以理解的，并提前感谢您发布任何帮助。

Answer 1

你不需要在熊猫中使用循环，它都是矢量化的。您之后的替换功能属于 .str. 命名空间。所以你需要做::

df.Leistung.str.replace(r'\d+', '')

单个字符串上的字符串拆分有效，但不适用于pandas中的一系列字符串

1 个答案: