Question

我是Python的新手，仍然学习数据框和文本提取的基础。

我有一列字符串可能多次包含或不包含“折扣率”。当“折扣率”存在时，我想获取该单词之后的第一组数字，并将其作为字符串放入新的列中。出现“率”一词后，数字并非总是立即出现，有时两者之间可能有一两个词。

我正在寻找一种方法来获取所有“折扣率”实例的文本。

当前，我的代码仅捕获所有出现的数字范围，但我只希望“折扣率”之后的数字范围。这是我的代码的快照：

df["ext"] = ""
for i, row in df.iterrows():
    df["ext"][i] = str(set(re.findall(r'\d+\.\d+%',df.loc[i,'txt']))).strip()

此代码的输出为我提供了一组字符串-稍后将其分成多列-

{'13.0%', '3.5%', '2.5%', '11.0%'}

作为参考，字符串通常看起来像这样：

...growth rates of 2.5% to 3.5% to xxx calendar year 2025 after-tax 
free cash flows. Xxx alsoperformed a discounted cash flow 
analysis of the xxx to calculate the present value of the after-tax xxxx that 
xxx forecasted would be generated during calendar years 2015(using only the 
fourth quarter of 2015) through 2025 and of the terminal value of the xxxx by 
applying perpetuity growth rates of 1.0% to 2.0% to the calendar year 2025 
after-tax free cash flows. The cash flows andterminal values were discounted 
to present value as of September 30,2015 using discount rates ranging from 
9.50% to 12.50%, which were based on an estimate of xxxs weighted average 
cost of capital. This analysis indicated thefollowing approximate implied per 
share equity value reference ranges for xxx as compared to the Merger 
Consideration....

Answer 1

我只能针对您提供的示例文本编写代码。

sample_text = '''...growth rates of 2.5% to 3.5% to xxx calendar year 2025 after-tax 
free cash flows. Xxx alsoperformed a discounted cash flow 
analysis of the xxx to calculate the present value of the after-tax xxxx that 
xxx forecasted would be generated during calendar years 2015(using only the 
fourth quarter of 2015) through 2025 and of the terminal value of the xxxx by 
applying perpetuity growth rates of 1.0% to 2.0% to the calendar year 2025 
after-tax free cash flows. The cash flows andterminal values were discounted 
to present value as of September 30,2015 using discount rates ranging from 
9.50% to 12.50%, which were based on an estimate of xxxs weighted average 
cost of capital. This analysis indicated thefollowing approximate implied per 
share equity value reference ranges for xxx as compared to the Merger 
Consideration....'''

split_sample_text = sample_text.split()

discount_ranges = list()
for index, word in enumerate(split_sample_text):
    if word == "discount" and  split_sample_text[index + 1] == "rates":
        start_rate = None
        end_rate = None

        for index_, rate in enumerate(split_sample_text[index + 2:]):
            if "%" in rate:
                try:
                    float(rate.rstrip("%,"))
                    if not start_rate:
                        start_rate = rate
                    elif not end_rate:
                        end_rate = rate.rstrip(',')
                except ValueError:
                    pass

            elif rate == "discount" and split_sample_text[index_ + 1:] == "rates":
                break

        if start_rate and end_rate:
            discount_ranges.append((start_rate, end_rate))

print discount_ranges

给我们：

[('9.50%', '12.50%')]

如果您将示例文本粘贴3倍，它仍将提取相同的折扣率三次，希望对您有所帮助！干杯！

针对每个关键字实例（Python），提取特定关键字之后的第一个数字出现

1 个答案: