我是Python的新手,仍然学习数据框和文本提取的基础。
我有一列字符串可能多次包含或不包含“折扣率”。当“折扣率”存在时,我想获取该单词之后的第一组数字,并将其作为字符串放入新的列中。出现“率”一词后,数字并非总是立即出现,有时两者之间可能有一两个词。
我正在寻找一种方法来获取所有“折扣率”实例的文本。
当前,我的代码仅捕获所有出现的数字范围,但我只希望“折扣率”之后的数字范围。这是我的代码的快照:
df["ext"] = ""
for i, row in df.iterrows():
df["ext"][i] = str(set(re.findall(r'\d+\.\d+%',df.loc[i,'txt']))).strip()
此代码的输出为我提供了一组字符串-稍后将其分成多列-
{'13.0%', '3.5%', '2.5%', '11.0%'}
作为参考,字符串通常看起来像这样:
...growth rates of 2.5% to 3.5% to xxx calendar year 2025 after-tax
free cash flows. Xxx alsoperformed a discounted cash flow
analysis of the xxx to calculate the present value of the after-tax xxxx that
xxx forecasted would be generated during calendar years 2015(using only the
fourth quarter of 2015) through 2025 and of the terminal value of the xxxx by
applying perpetuity growth rates of 1.0% to 2.0% to the calendar year 2025
after-tax free cash flows. The cash flows andterminal values were discounted
to present value as of September 30,2015 using discount rates ranging from
9.50% to 12.50%, which were based on an estimate of xxxs weighted average
cost of capital. This analysis indicated thefollowing approximate implied per
share equity value reference ranges for xxx as compared to the Merger
Consideration....
答案 0 :(得分:0)
我只能针对您提供的示例文本编写代码。
sample_text = '''...growth rates of 2.5% to 3.5% to xxx calendar year 2025 after-tax
free cash flows. Xxx alsoperformed a discounted cash flow
analysis of the xxx to calculate the present value of the after-tax xxxx that
xxx forecasted would be generated during calendar years 2015(using only the
fourth quarter of 2015) through 2025 and of the terminal value of the xxxx by
applying perpetuity growth rates of 1.0% to 2.0% to the calendar year 2025
after-tax free cash flows. The cash flows andterminal values were discounted
to present value as of September 30,2015 using discount rates ranging from
9.50% to 12.50%, which were based on an estimate of xxxs weighted average
cost of capital. This analysis indicated thefollowing approximate implied per
share equity value reference ranges for xxx as compared to the Merger
Consideration....'''
split_sample_text = sample_text.split()
discount_ranges = list()
for index, word in enumerate(split_sample_text):
if word == "discount" and split_sample_text[index + 1] == "rates":
start_rate = None
end_rate = None
for index_, rate in enumerate(split_sample_text[index + 2:]):
if "%" in rate:
try:
float(rate.rstrip("%,"))
if not start_rate:
start_rate = rate
elif not end_rate:
end_rate = rate.rstrip(',')
except ValueError:
pass
elif rate == "discount" and split_sample_text[index_ + 1:] == "rates":
break
if start_rate and end_rate:
discount_ranges.append((start_rate, end_rate))
print discount_ranges
给我们:
[('9.50%', '12.50%')]
如果您将示例文本粘贴3倍,它仍将提取相同的折扣率三次,希望对您有所帮助!干杯!