获得多个带文件名的txt文件,提供它包含的主题。我需要使用glob读取文件,然后创建一个包含2列的数据框,1 - 内容和2 - 主题名称(取自文件名)
@media (max-width: 767px) {
.videoWrapper {
width: 100%;
}
#media_player {
width: 100%;
}
#media_player
{
width: 100%;
}
}
输出:
#filename sample - 51132_1.txt
for name in gb.glob('./*_1*'):
f1 = open(name,"r")
rl = f1.readlines()
topicName = name.split('_1')[0]
#print(topicName)
df = pd.DataFrame({'content':rl})
df['topicName'] = topicName
print(df)
我正在寻找不同的输出:
content topicName
0 .\54468
1 .\54468
2 In article <sheafferC63zt0.Brs@netcom.com shea... .\54468
3 .\54468
4 .\54468
5 .\54468
6 It had to happen: the old allegation of the "d... .\54468
我如何实现这一目标?
答案 0 :(得分:1)
类似的东西:
import pandas as pd
import glob as gb
def process_file(file):
with open(file, "r") as f:
content = f.read()
topic = file.split('_1')[0]
return {"content": content, "topicname": topic}
data = [process_file(file) for file in gb.glob('./*_1*')]
df = pd.DataFrame(data)
答案 1 :(得分:1)
使用os.path.basename
获取文件名,然后使用str.split
<强>实施例
import glob
import os
import pandas as pd
res = []
for name in gb.glob('./*_1*'):
with open(name, "r") as f1:
res.append({'content':f1.read(), "topicname": os.path.basename(name).split('_1')[0]})
df = pd.DataFrame(res)
print(df)