我有一个带有“名称”列的dataframe-df,如下所示:
Names
AL GHAITHA & AL MOOSA
AL ASEEL ELECTRONICS T
SUNRISE SUPERMARKET-QU
EMARAT-AL SAFIYAH(6735
LULU CENTRE LLC EFT TE
THE MAX
代码:
remove_letters = ['AL ', 'THE ']
# my function below :
def remove_start_words(df, col, letters):
for l in letters:
for i in df.index:
x = df.at[i, col]
if x.startswith(l):
df.at[i, col] = x[len(l):]
else:
df.at[i, col] = x
def remove_strings(self, df, col):
for i in df.index:
x = df.at[i, col]
x = x.split(' ')
if len(x) > 1:
if len(x[1]) > 2:
x[1] = ''.join(e for e in x[1] if e.isalnum())
x = ' '.join(x[0:2])
df.at[i, col] = x
else:
df.at[i, col] = x[0]
else:
df.at[i, col] = df.at[i, col]
def remove_end_digits(self, df, col):
for i in df.index:
x = df.at[i, col]
df.at[i, col] = x.rstrip(string.digits)
# calling my function
remove_start_words(df=df, col='Names',
letters=remove_letters)
remove_strings(df=df, col='Names')
remove_end_digits(df=df, col='Names')
现在的问题是我有一个超过一百万列值的数据框。 我的代码优化不好吗?如何获得优化的解决方案?
问题1: 我可以理解,我已经使用了2个导致缓慢的循环(其中1个用于remove_letters,其他的用于所有列值)。
有更好的方法吗?在这里,我可以检查列值是否以remove_letters列表中提到的字母开头并将其一键剥离?
问题2和3: 函数的目标是什么-“ remove_strings”: 从列名称中仅获取2个字符串。例如:ASEEL ELECTRONICS T 输出将是:ASEEL ELECTRONICS
有没有更快的功能:remove_strings,remove_end_digits
主要问题:这三项功能能否一并完成?
预期的数据框:
Names
GHAITHA
ASEEL ELECTRONICS
SUNRISE SUPERMARKET
EMARAT-AL SAFIYAH
LULU CENTRE
MAX
注意:函数“ remove_start_words”应该检查是否有任何提到的字母以“名称”开头,如果是,则将其删除。 例如:“ AL THEMAX”应为“ THEMAX”,而不应为“ MAX”(同时删除AL和THE)
谢谢。
答案 0 :(得分:0)
您可以使用以下替换方法:
<!-- category image -->
<div class="row">
<div class="col-md-6">
<div class="form-group">
<label for="category">Category Image</label>
<br/>
<div id="updCategoryPreview"></div>
<input type="file" class="img" id="upd-category" name="upd-category">
</div>
</div>
</div>
<!-- banner image -->
<div class="row">
<div class="col-md-6">
<div class="form-group">
<label for="banner">Category Banner</label>
<br>
<div id="updBannerPreview"></div>
<input type="file" class="img" id="upd-banner" name="upd-banner">
</div>
</div>
</div>
答案 1 :(得分:0)
由于您说过只希望删除句子开头的单词,因此可以使用正则表达式:
import pandas as pd
file_path = 'file3.xlsx'
df = pd.read_excel(file_path)
words_to_remove = ["THE", "AL"]
regular_expression = '^' + '|'.join(words_to_remove)
df.Names = df.Names.apply(lambda x : re.sub(regular_expression, "", x))
regular_expression表达式变量在这种情况下将包含^ THE | AL,表示字符串开头的THE或AL。
答案 2 :(得分:0)
在Google上进行的几分钟搜索告诉我
CREATE TRIGGER VIP_Monitor
ON [ReportServer].[dbo].[Catalog]
AFTER INSERT, UPDATE
AS
DECLARE
@TestPath NVARCHAR(MAX),
@TestDataSource NVARCHAR(MAX),
@WrongPath NVARCHAR(MAX)
SET @TestPath = '/VIP-Area/'
SET @TestDataSource = 'dsDWH_VIP'
IF @TestDataSource = (SELECT Cat1.[Name] AS [DatasourceName]
FROM [ReportServer].[dbo].[Catalog] AS Cat1
LEFT JOIN [ReportServer].[dbo].[DataSource] AS DS1 ON Cat1.ItemID = DS1.Link
LEFT JOIN [ReportServer].[dbo].[Catalog] AS Cat2 ON DS1.ItemID = Cat2.ItemID
WHERE Cat1.[ItemID] = 'B5DE8D20-894E-4D38-8340-164A0DE61F0F')
IF @TestPath != (SELECT LEFT(Cat1.[Path], 10) AS [DatasourceName]
FROM [ReportServer].[dbo].[Catalog] AS Cat1
LEFT JOIN [ReportServer].[dbo].[DataSource] AS DS1 ON Cat1.[ItemID] = DS1.Link
LEFT JOIN [ReportServer].[dbo].[Catalog] AS Cat2 ON DS1.[ItemID] = Cat2.[ItemID]
WHERE Cat1.[ItemID] = 'B5DE8D20-894E-4D38-8340-164A0DE61F0F')
SET @WrongPath = (SELECT LEFT(Cat1.[Path], 10) AS [DatasourceName]
FROM [ReportServer].[dbo].[Catalog] AS Cat1
LEFT JOIN [ReportServer].[dbo].[DataSource] AS DS1 ON Cat1.[ItemID] = DS1.Link
LEFT JOIN [ReportServer].[dbo].[Catalog] AS Cat2 ON DS1.[ItemID] = Cat2.[ItemID]
WHERE Cat1.[ItemID] = 'B5DE8D20-894E-4D38-8340-164A0DE61F0F')
DELETE FROM [ReportServer].[dbo].[Catalog] AS Cat1
WHERE Cat1.[Name] = ### Inserted Report Name ? ###
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'Admin',
@recipients = 'test@test.de',
@body = 'The VIP-Report was built in ' + @WrongPath ,
@subject = 'Warning: VIP-Report in false Folder';
应该可以解决问题。