我正在用Python编写脚本,以在选定文件夹中的一堆.txt文件中搜索选定的术语(单词/成对单词,句子),并打印出包含选定内容的.txt文件的名称术语。目前使用os模块可以正常工作:
live/.*?/(?=[0-9])
输出将是这样的:
Replace : live/.*?/([0-9])
Replace with : live/my-new-text/\1
我想将这些结果作为字符串列追加到Pandas Dataframe中,但是当我尝试这样做时,我收到了错误消息:
import os
dirname = '/Users/User/Documents/test/reports'
search_terms = ['Pressure']
search_terms = [x.lower() for x in search_terms]
for f in os.listdir(dirname):
with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
text = infile.read()
if all(term in text for term in search_terms):
print (f)
这怎么办?
答案 0 :(得分:2)
在下面的code
中,新行用'*
'表示。
问题代码
import os
import pandas as pd # new line * * *
import numpy as np # new line * * *
dirname = '/Users/User/Documents/test/reports'
search_terms = ['Pressure']
search_terms = [x.lower() for x in search_terms]
# Create empty dataframe to store file names # new line * * *
df = pd.DataFrame() # new line * * *
for f in os.listdir(dirname):
with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
text = infile.read()
if all(term in text for term in search_terms):
print (f)
# Store value 'f' inside a dataframe column
df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)
示例代码
f = ['3003.txt', '3002.txt', '3006.txt', '3008.txt']
df = pd.DataFrame({'file_names': f})
df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)
df