我正在处理一个问题陈述,其中我必须表示每小时发生的错误数据传输。我指的是https://www.kaggle.com/gpsaikrishna/credit-card-fraud-detection-smote-deep-learning 仅用于数据表示。
我试图将数据放入我的工作目录中。仍然出现操作系统错误。我的系统的日期和时间处于自动更新状态,并且夏令时无效。并被格式化为 时间:24小时; HH:mm和 日期:DD-MM-YYYY
下面是我试图从Kaggle内核在系统上运行的代码
def convert_totime(seconds):
return datetime.datetime.fromtimestamp(seconds);
timeAnalysis = data[['Time', 'Amount', 'Class']].copy()
timeAnalysis['datetime'] = timeAnalysis.Time.apply(convert_totime)
timeDelta = datetime.datetime.utcnow() - datetime.datetime.now()
# As the max time is 172792 seconds and 172792 / (60*60) is about 48 hrs so we only have data for 2 days so only
# plotting data against hours make sense
timeAnalysis['hour of the day'] = timeAnalysis.datetime + timeDelta
timeAnalysis['hour of the day'] = timeAnalysis['hour of the day'].dt.hour
timeAnalysisGrouped = timeAnalysis.groupby(['Class', 'hour of the day'])['Amount'].count()
我收到错误消息
OSError Traceback (most recent call last)
<ipython-input-77-002a1f9a93fc> in <module>()
3
4 timeAnalysis = data[['Time', 'Amount', 'Class']].copy()
----> 5 timeAnalysis['datetime'] = timeAnalysis.Time.apply(convert_totime)
6 timeDelta = datetime.datetime.utcnow() - datetime.datetime.now()
7 # As the max time is 172792 seconds and 172792 / (60*60) is about 48 hrs so we only have data for 2 days so only
~\AppData\Local\conda\conda\envs\env\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
3190 else:
3191 values = self.astype(object).values
-> 3192 mapped = lib.map_infer(values, f, convert=convert_dtype)
3193
3194 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-77-002a1f9a93fc> in convert_totime(seconds)
1 def convert_totime(seconds):
----> 2 return datetime.datetime.fromtimestamp(seconds);
3
4 timeAnalysis = data[['Time', 'Amount', 'Class']].copy()
5 timeAnalysis['datetime'] = timeAnalysis.Time.apply(convert_totime)
OSError: [Errno 22] Invalid argument
我期望通过使用以下代码获得欺诈交易数量的图表:
plt.figure(figsize = (10, 6))
fraudTransactions = timeAnalysisGrouped[1].copy()
fraudTransactions.name = 'Number of transactions'
fraudTransactions.plot.bar(title = 'Number of fraud credit card transactions per hour', legend = True)
答案 0 :(得分:0)
您是否已经尝试过.map
来代替.apply
。将Series
对象传递给函数参数,而不是元素本身。因此,换句话说,通常在使用“矢量化”功能时使用它。
您的函数只能用于单个向量,而不能用于序列,这就是代码的问题。
因此您可以尝试:
timeAnalysis.Time.map(convert_totime)
那应该可以解决您的主要问题。此外,您还可以检查是否可以直接使用更多本地大熊猫数据类型。做类型转换的最方便的方法。如果您尚未尝试过,可以使用以下内置numpy
内置逻辑检查日期时间值是否正确解析:
timeAnalysis.Time.astype('datetime64')
或者,您也可以在函数中使用pandas.to_datetime
,如下所示:
def convert_totime(series):
return pd.to_datetime(series)
如果要转换的值为整数(应解释为秒),则需要将unit='s'
传递给to_datetime
。
如果值以字符串格式显示,则可能需要对适当的格式字符串使用format=
。有关此功能的更多信息,请参见此处:Description of pandas.to_datetime
答案 1 :(得分:0)
def convert_totime(series):
return pd.to_datetime(series, unit='s')
timeAnalysis.Time.apply(convert_totime)