我想浏览一个文件夹,并想检查该文件夹中的每个文件属于哪个时区。为此,我有一个csv文件
ip1 ip2 timezone
0 16777215 0
16777216 16777471 +10:00
16777472 16778239 +08:00
16778240 16779263 +11:00
16779264 16781311 +08:00
16781312 16785407 +09:00
...
当特定的ip_number在ip1和ip2之间时,相关的时区在第三列中。
df = pd.read_csv('IP2LOCATION-LITE-DB11.csv', parse_dates=True)
path="Testordner"
os.chdir(path)
result = [i for i in glob.glob('*.{}'.format("csv"))]
os.chdir("..")
for i in result:
df2 = pd.read_csv("twiceaweek/"+i, parse_dates=True)
w1,x1,y1,z1=i.split('.')
w=int(w1)
x=int(x1)
y=int(y1)
ip_number= 16777216*w + 65536*x + 256*y+1
我不知道如何在ip1
和ip2
之间排列数字,以及如何将每个文件的ip_number合并到它们并获取我的时区。你有什么想法吗?
答案 0 :(得分:0)
您要$FileToCheck = Get-Item -Path $folder/test.zip -ErrorAction SilentlyContinue
$EmailSplat = @{
To = 'business@email.com'
CC = 'admin@email.com'
#SmtpServer = 'smtp.server.net'
From = 'my@email.com'
Priority = 'High'
}
$folder = "C:\test\"
# first condition: 'If the file does not exist, or was not created today, an e-mail should be sent that states "File not created" or similar.'
if ((-not $FileToCheck) -or ($FileToCheck.CreationTime -le (Get-Date).AddDays(-1))) {
$EmailSplat.Subject = 'File not Found or not created today'
$EmailSplat.building = 'This is the email building'
Send-MailMessage @EmailSplat
# second condition 'If the file exists and was created today, but has no content, no e-mail should be sent.'
} elseif (($FileToCheck) -and ($FileToCheck.Length -le 2)) {
#third condition and the default condition if it does not match the other conditions
} else {
$EmailSplat.Subject = 'Active Directory Accounts To Check'
$EmailSplat.building = Get-Content -Path/test.zip //maybe add the file??
Send-MailMessage @EmailSplat
}
:
qcut
输出:
thresholds = list(df['ip1']) + [df['ip2'].iloc[-1]]
# test:
ips = df[['ip1', 'ip2']].mean(axis=1).astype(int)
# bucketing
buckets = pd.cut(ips, thresholds,
right=True,
include_lowest=True,
labels=False)
# get the labels:
df['timezone'].values[buckets]
答案 1 :(得分:0)
您可以使用merge_asof
。它允许找到小于搜索值的最后一个索引,这就是您所需要的。因此,要在找到IP地址后找到时区,请使用:
tmp = pd.merge_asof(pd.DataFrame([ip_number], columns=['ip']), df, left_on=['ip'],
right_on=['ip1'])
tmp = tmp[tmp.ip2>ip_number]
if len(tmp) > 0:
tz = tmp.at[0, 'timezone']
else:
tz = '' # not found
或者,您可以使用searchsorted
:
ix = df['ip2'].searchsorted([ip_number], 'right')[0]
if ix == len(df) or df.at[ix, 'ip1']>ip_number:
tz = '' # not found:
else:
tz = df.at[ix, 'timezone']