我正在尝试编写一个可以在数据框中找到缺少日期的函数。
这是我的情况: (数据按客户排序,然后按日期排序。 日期格式为:M / D / Y)
<!DOCTYPE HTML>
<html>
<head>
<link rel="apple-touch-icon" sizes="180x180" href="images\free_horizontal_on_white_by_logaster.jpg">
<link rel="icon" type="image/jpg" sizes="32x32" href="images\free_horizontal_on_white_by_logaster.jpg">
<link rel="icon" type="image/jpg" sizes="16x16" href="images\free_horizontal_on_white_by_logaster.jpg">
<meta name="msapplication-TileColor" content="#da532c">
<meta name="theme-color" content="#ffffff">
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<link rel="stylesheet" href="resolve.css">
<title>Resolve - Real Women, Real Feedback</title>
</head>
<body>
<header>
<div class="container">
<div id="branding">
<a href="indexresolve.html"><img src="images/lasttry.png" alt="resolvelogo"></a>
</div>
<nav>
<li><a href="indexresolve.html">Home</a></li>
< <li><a href="howitworks.html">How It Works</a></li>
<li><a href="contact.html">Contact</a></li>
<li><a href="faq.html">FAQ</a></li>
<li><button id="login" class="button">Log In</button></li>
<div id="login-modal">
<div id="login-content">
<span class="close">×</span>
<img id="login-logo" src="images\free_horizontal_on_white_by_logaster.jpg">
<form>
<input class="login-input" type="text" placeholder="username">
<input class="login-input" type="password" placeholder="password">
<button>Log In</button>
</form>
<p>By clicking log in, you agree to our <a href="terms.html">Terms</a>, <a href="privacy.html">Privacy Policy</a>, and our <a href="cookie.html">Cookie Policy</a>.</p>
</div>
</div>
</nav>
</header>
<section>
<div class="container2">
<div>
<h1>Guys</h1>
<h2>fajfsda klfsdajfodisjflkd oisdjfklewjf oisdjfsakfj akfjfslkdja;fj sd;akfjdkfjsdakfj saifjsdakfs.</h2>
<button>Get Started</button>
</div>
<div>
<h1>Ladies</h1>
<h2>dklasdjfs kdsjdlk jfsalkjf las;fjdaa fdaksjdk skjfsidjf akldfjskl fjsdlkfjskdlfjsdifjdkf dkfjsdijf s </h2>
<button id="login">Get Started</button>
</div>
<div class="appad">
<h2>App Coming Soon!</h2>
</div>
</div>
<script src="resolve.js"></script>
</body>
</html>
该功能应读取“起始日期”和“截止日期”,并查看日期(每个客户)是否连续。然后,添加一列(“结果”)并显示结果。
该功能必须在每个客户上迭代。
(已添加评论)
请查看我的预期输出。我也在添加索引和一些解释: 索引[1]显示缺失,因为连续性被破坏,您可以通过比较To date [0]与From date [2]得出这个结论,这两个值不相同。另一方面:到date [2] =从date [4]开始,这就是为什么“结果”显示为Not Missing [3]。
From Date To Date
Customer
A 1/10/2017 2/9/2017
A NaN NaN
A 3/10/2017 4/9/2017
A NaN NaN
A 4/9/2017 5/9/2017
B 2/10/2017 3/9/2017
B NaN NaN
B 3/9/2017 4/9/2017
任何帮助将不胜感激。
答案 0 :(得分:0)
将pd.DataFrame.groupby
与pd.to_datetime
一起使用:
df['From Date'] = pd.to_datetime(df['From Date'], format="%m/%d/%Y")
df['To Date'] = pd.to_datetime(df['To Date'], format="%m/%d/%Y")
dfs = []
for k, d in df.groupby('Customer'):
dt = d.dropna()['To Date'].shift(1)[1:]
res = []
for i in range(dt.shape[0]):
if (d['From Date'][dt.index] == dt).iloc[i]:
res.append('Not Missing')
else:
res.append('Missing')
for i in range(dt.shape[0]):
dt.iloc[i] = res[i]
dt.index -= 1
dfs.append(pd.concat([d, dt], 1))
result = pd.concat(dfs)
print(result)
Customer From Date To Date To Date
0 A 2017-01-10 2017-02-09 NaN
1 A NaT NaT Missing
2 A 2017-03-10 2017-04-09 NaN
3 A NaT NaT Not Missing
4 A 2017-04-09 2017-05-09 NaN
5 B 2017-02-10 2017-03-09 NaN
6 B NaT NaT Not Missing
7 B 2017-03-09 2017-04-09 NaN
最后:
df.columns = ['From Date', 'To Date', 'Results']
print(df)
Customer From Date To Date Results
0 A 2017-01-10 2017-02-09 NaN
1 A NaT NaT Missing
2 A 2017-03-10 2017-04-09 NaN
3 A NaT NaT Not Missing
4 A 2017-04-09 2017-05-09 NaN
5 B 2017-02-10 2017-03-09 NaN
6 B NaT NaT Not Missing
7 B 2017-03-09 2017-04-09 NaN
说明:
pd.to_datetime
:这是将您的近似日期数据转换为实际日期时间数据。这样,pandas
可以进行一些计算(例如两天之间的diff
)。由于它是串行操作,因此必须在每个所需的列上执行,而不是在整个数据帧上执行。df.groupby
:groupby
返回以给定条件为键的类似dict的对象。由于整个计算都是在每个 Customer
上完成的,因此请使用df.groupby('Customer')。dt = d.dropna()['To Date'].shift(1)[1:]
:d
是仅包含单个Customer
数据的数据帧子集。 shift(1)
在下面提供了数据帧移位1个单元格。这是为了简化To Date
和From Date
之间的比较。d['From Date'][dt.index] == dt
:提供To Date
和From Date
之间比较的布尔结果。dt.iloc[i] = res[i]
:拥有list
的失踪和不失踪的邮件后,请将其分配回dt
以创建Results
列。dfs.append(pd.concat([d, dt] 1))
:将新创建的Results
列与原始d
连接起来,然后append
合并到list
result = pd.concat(dfs)
:dfs
现在包含每个Customer
的子集数据帧。将它们连接到一个大数据框中。result.columns = ['To Date', 'From Date', 'Results']
:重新分配列名。