我有两列,一列包含文本字符串,另一列包含这些文本字符串的显示时间。在下面的示例中,您将看到文本将随时间显示,但在添加新文本时将逐个消失。 Here is an example
Time (s) Text string
5 This example
7 This example
10 example
11 example is cool
15 is cool
16 cool
17
19 That example is
20 example is
21 is awesome
23 awesome
24
我想提取每个文本的消失时间。例如,它应该是这样的: Here is the result I want
Disappeared time (s) Text
10 This
15 example
16 is
17 cool
20 That
21 example
23 is
24 awesome
如何编写python代码来执行此操作。我是python的初学者,因此代码示例和解决问题的想法很有帮助。 非常感谢你提前!
答案 0 :(得分:1)
使用:
DataFrame
的set_index
和str.get_dummies
1
where
将False
转换为NaN
s stack
rename_axis
,reset_index
和drop
df = df.set_index('Time (s)')['Text string'].str.get_dummies(' ')
print (df)
That This awesome cool example is
Time (s)
5 0 1 0 0 1 0
7 0 1 0 0 1 0
10 0 0 0 0 1 0
11 0 0 0 1 1 1
15 0 0 0 1 0 1
16 0 0 0 1 0 0
17 0 0 0 0 0 0
19 1 0 0 0 1 1
20 0 0 0 0 1 1
21 0 0 1 0 0 1
23 0 0 1 0 0 0
24 0 0 0 0 0 0
df1 = (df.where(df.ne(df.shift().bfill()) & df.eq(0))
.stack()
.rename_axis(('Disappeared time (s)','Text'))
.reset_index()
.drop(0, axis=1))
print (df1)
Disappeared time (s) Text
0 10 This
1 15 example
2 16 is
3 17 cool
4 20 That
5 21 example
6 23 is
7 24 awesome