获取部分中的文本

时间:2018-07-21 02:49:43

标签: r

这是我的文字:

Title
1 First section
1.1 Introduction1
Hello. My name is John. I am an under graduate student. I live in the U.S. I am majoring in computer science. Blah blah blah.
1.2 Another Intro
My last name is Doe. Blah blah blah blah. Another random sentence.
2 next section name
2.1 Random Section name
Blah blah blah blah. Another random sentence. Another random sentence.
Another random sentence. 
2.2 Requirements
The requirements include:
1. blah blah
2. blah blah blah
3. another random sentence
3 Third section
Blah blah blah. Blah blah blah blah.
4 End

我想创建一个数据框,如下所示:

Section Name            String

1 First section      
1.1 Introduction1       Hello. My name is John. I am an under graduate student. I live in the U.S. I am majoring in computer science. Blah blah blah.
1.2 Another Intro       My last name is Doe. Blah blah blah blah. Another random sentence.
2 next section name 
2.1 Random Section name Blah blah blah blah. Another random sentence. Another random sentence.
2.2 Requirements        The requirements include:
                        1. blah blah
                        2. blah blah blah
                        3. another random sentence
3 Third Section         Blah blah blah. Blah blah blah blah.
4 End   

所以基本上,我想创建一个包含两列的数据框: 节号和名称,以及一列,其中包含该节后的所有内容,直到下一个节号。

1 个答案:

答案 0 :(得分:0)

以下解决方案并非针对各种格式选项或“奇怪”字符串的故障保护。它还使用一些变通办法来使您的文本更易于解析。您可能需要根据输入内容来调整/使用正则表达式。此外,关于速度,肯定可以改进以下方法。但是,它至少应该为您提供一个解决问题的方法。

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import spline

x = np.array([0,1,2])
y = np.array([5, 4.31, 4.01])
plt.plot(x, y)

xnew = np.linspace(x.min(), x.max(), 300)
smooth = spline(x, y, xnew, order=2)
plt.plot(xnew, smooth)


plt.show()