如何将我的功能应用于pyspark数据框的所有行

时间:2019-10-26 20:56:55

标签: python function dataframe pyspark

你好,我有这段代码来分割mp3音频,当我给出如下例所示的参数时,它将起作用:

from pydub import AudioSegment

start_time = '00:00:50' #Format 'hh:mm:ss'
end_time = '00:01:48'   #Format 'hh:mm:ss'
filename = 'C://Users//home//Desktop//testPython//audio//test.mp3'

和此功能:

def splitTimes(time):
    splitted_time = time.split(sep = ':')
    toSec = (int(splitted_time[0])*3600)+(int(splitted_time[1])*60)+int(splitted_time[2])
    toMillisec = toSec * 1000
    print(toMillisec)
    return toMillisec

我编写了以下代码以应用它:

sound = AudioSegment.from_file(filename)

print('Splitting Audio...')
firstpart = sound[splitTimes(start_time):splitTimes(end_time)]
firstpart.export("C://Users//User-7//Desktop//testPython//audio//splitted.mp3", format="mp3")
print('Done...')

,并且工作方式正确。 我有一个pyspark数据框,其中包含:

+--------+--------+--------+
|   start|     end|FileName|
+--------+--------+--------+
|00:00:11|00:00:23|       2|
|00:00:54|00:01:16|       3|
|00:02:12|00:02:24|     4_m|
|00:02:28|00:02:41|     4_p|
+--------+--------+--------+

我的问题: 如何在此数据框的所有行上应用此功能

start_time = start 
end_time = end
splitted = filename (name of splitted file)

0 个答案:

没有答案