我有一个UWP(C#)应用程序,该应用程序正在远程计算机(在Windows 10下)的生产环境中运行,但是它会定期崩溃。 我的客户每隔9小时左右便说一次。
我有上次崩溃的几个.wer文件,但没有小型转储,事件查看器条目中引用的崩溃路径除了WER文件外为空白。 有关如何获得小型转储和发现的信息,请参见下面的编辑内容。
该异常是ntdll.dll中异常偏移量0x0004df23处的访问冲突(0xc0000005)
我有该应用程序的完整源代码,可以在调试中长期运行而不会崩溃。
如果我使用DLL导出查看器并加载ntdll.dll的确切版本(从远程计算机复制),则可以看到在相对地址0x0004dc60是EtwNotificationRegister,在0x0004e260是LdrGetDllPath。
这是否意味着我的崩溃发生在EtwNotificationRegister中的一行代码中(该行又被我们代码中的某些东西调用;但是如果没有堆栈/小型转储,很难跟踪) 我不确定dll的布局是否可以这样放置我的地址?
根据@Raymond编辑2:否。几乎可以肯定,在EtwNotificationRegister和LdrGetDllPath之间还有其他未导出的函数。在内部版本17763.475上,偏移量4df23为RtlpWaitOnCriticalSection,因此您可能使用的是未初始化的关键节或已删除的关键节。
有什么办法可以提取有关此崩溃的更多详细信息?我可以远程访问运行该应用程序的计算机,但崩溃似乎不是由特定事件触发的(例如,我们无法按下按钮并导致崩溃)
立即使用小型转储
我也在本地调试中运行该程序。 我有一个用于远程进程的远程调试器,但似乎无法中断或检查线程,不确定原因。只需重新部署符号,调试器就不会出现任何问题,但是它会跳过所有断点:(
我们自己的(而不是幼稚的)本地日志文件,最初仅用于本地调试,是用StreamWriter.WriteLine
编写的,然后紧跟着StreamWriter.Flush
(包装在try catch中,因为这不是线程安全的)只是在远程计算机上的正常事件结束-此正常事件之后没有任何事件。
我们捕获到App_UnhandledException并写入此日志,因此我希望这里有一个堆栈。
在Unexplained crashes related to ntdll.dll中,建议从ntdll.dll崩溃是煤矿Unexplained crashes related to ntdll.dll中的金丝雀
编辑1:我已经按照https://www.meziantou.net/2018/06/04/tip-automatically-create-a-crash-dump-file-on-error配置了自动崩溃转储,所以如果我可以让它再次崩溃,也许下次我会得到一个转储文件吗?
这是WER中的详细信息
Version=1
EventType=MoAppCrash
EventTime=132017523132123596
ReportType=2
Consent=1
UploadTime=132017523137590717
ReportStatus=268435456
ReportIdentifier=8d467f04-4bdd-4f9e-bf26-b42d143ece1a
IntegratorReportIdentifier=b60f9ca0-4126-4262-a886-98d3844892d3
Wow64Host=34404
NsAppName=praid:App
OriginalFilename=XXXXXX.YYYYYY.exe
AppSessionGuid=00001514-0001-0004-9fe2-6df11905d501
TargetAppId=U:XXXXXX.YYYYYY_1.0.201.0_x64__b0abmt6f49vqj!App
TargetAppVer=1.0.201.0_x64_!2018//01//24:08:17:16!1194d!XXXXXX.YYYYYY.exe
BootId=4294967295
TargetAsId=1298
UserImpactVector=271582000
IsFatal=1
EtwNonCollectReason=4
Response.BucketId=2ee79f27e2e81a541d6200d746866340
Response.BucketTable=5
Response.LegacyBucketId=2117255699418735424
Response.type=4
Sig[0].Name=Package Full Name
Sig[0].Value=XXXXXX.YYYYYY_1.0.201.0_x64__b0abmt6f49vqj
Sig[1].Name=Application Name
Sig[1].Value=praid:App
Sig[2].Name=Application Version
Sig[2].Value=1.0.0.0
Sig[3].Name=Application Timestamp
Sig[3].Value=5a68410c
Sig[4].Name=Fault Module Name
Sig[4].Value=ntdll.dll
Sig[5].Name=Fault Module Version
Sig[5].Value=10.0.17763.475
Sig[6].Name=Fault Module Timestamp
Sig[6].Value=3230aa04
Sig[7].Name=Exception Code
Sig[7].Value=c0000005
Sig[8].Name=Exception Offset
Sig[8].Value=000000000004df23
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=10.0.17763.2.0.0.256.48
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=5129
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=95b1
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=95b15a88b673e33a5f48839974790b1c
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=283d
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=283dea7b6b6112710c1e3f76ed84d993
编辑3:昨晚崩溃时minidump的屏幕截图。在事件日志中,WER崩溃看起来相同,因此这似乎是相同的问题。我会看看我是否可以加载符号等。
编辑4:尝试调试托管。线程视图将线程显示为异常点,但没有调用堆栈信息。
编辑5:从小型转储中调试本机。看起来我们有赢家。 @Raymond是正确的,这是从BluetoothLEAdvertismentWatcher :: AdvertismentReceivedCallbackWorker调用的RtlpWaitOnCriticalSection
本地调用堆栈为文本:
未标记> 8748 0辅助线程Win64 线程Windows.Devices.Bluetooth.dll!(无效) ntdll.dll!RtlpWaitOnCriticalSection() ntdll.dll!RtlpEnterCriticalSectionContended() ntdll.dll!RtlEnterCriticalSection() Windows.Devices.Bluetooth.dll!(无效)() Windows.Devices.Bluetooth.dll!wil :: ResultFromException <(无效)
() Windows.Devices.Bluetooth.dll!Windows :: Devices :: Bluetooth :: Advertisement :: BluetoothLEAdvertisementWatcher :: AdvertisementReceivedCallbackWorker(void) Windows.Devices.Bluetooth.dll!Windows :: Devices :: Bluetooth :: Advertisement :: BluetoothLEAdvertisementWatcher :: AdvertisementReceivedThreadpoolWorkCallbackStatic(struct _TP_CALLBACK_INSTANCE *,void *,结构_TP_WORK *) ntdll.dll!TppWorkpExecuteCallback() ntdll.dll!TppWorkerThread() kernel32.dll!BaseThreadInitThunk() ntdll.dll!RtlUserThreadStart()
编辑6:好的,现在该怎么办?我该如何解决这个问题?我对堆栈的了解是在回调内部引发了异常吗?那是对的吗? 因此,我可以将托管的try / catch放入BLE advertisment回调处理程序中,并且应该(捕获-以便进一步调试)对其进行修复?
编辑7:代码... 这是我们用来实例化包装器和订阅事件的代码。 BluetoothLEAdvertisementWatcherWrapper是一个精简类(例如,它只是包装了底层的BluetoothLEAdvertisementWatcher,因此它可以实现一个接口;它只是将所有事件都传递并公开属性。我们这样做是为了拥有一个可以创建用于测试的虚拟事件的不同版本) / p>
bluetoothAdvertisementWatcher = new BluetoothLEAdvertisementWatcherWrapper();
bluetoothAdvertisementWatcher.SignalStrengthFilter.SamplingInterval = TimeSpan.Zero;
bluetoothAdvertisementWatcher.ScanningMode = BluetoothLEScanningMode.Active;
bluetoothAdvertisementWatcher.Received += Watcher_Received;
bluetoothAdvertisementWatcher.Stopped += Watcher_Stopped;
bluetoothAdvertisementWatcher.Start();
这是包装器的代码;只是为了表明它没有做任何复杂的事情:
public class BluetoothLEAdvertisementWatcherWrapper : IBluetoothAdvertismentWatcher, IDisposable
{
private BluetoothLEAdvertisementWatcher bluetoothWatcher;
public BluetoothLEAdvertisementWatcherWrapper()
{
bluetoothWatcher = new BluetoothLEAdvertisementWatcher();
}
public BluetoothSignalStrengthFilter SignalStrengthFilter => bluetoothWatcher.SignalStrengthFilter;
public BluetoothLEScanningMode ScanningMode
{
get
{
return bluetoothWatcher.ScanningMode;
}
set
{
bluetoothWatcher.ScanningMode = value;
}
}
public event TypedEventHandler<BluetoothLEAdvertisementWatcher, BluetoothLEAdvertisementReceivedEventArgs> Received
{
add
{
bluetoothWatcher.Received += value;
}
remove
{
bluetoothWatcher.Received -= value;
}
}
public event TypedEventHandler<BluetoothLEAdvertisementWatcher, BluetoothLEAdvertisementWatcherStoppedEventArgs> Stopped
{
add
{
bluetoothWatcher.Stopped += value;
}
remove
{
bluetoothWatcher.Stopped -= value;
}
}
public BluetoothLEAdvertisementWatcherStatus Status => bluetoothWatcher.Status;
public Action<IPacketFrame, short> YieldAdvertisingPacket { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }
public void Start()
{
bluetoothWatcher.Start();
}
public void Stop()
{
bluetoothWatcher.Stop();
}
public void Dispose()
{
if (bluetoothWatcher != null)
{
if (bluetoothWatcher.Status == BluetoothLEAdvertisementWatcherStatus.Started)
{
bluetoothWatcher.Stop();
}
bluetoothWatcher = null;
}
}
}
这是Watcher_Received事件处理程序的代码:
private void Watcher_Received(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementReceivedEventArgs args)
{
try
{
//we won't queue packets until registered
if (!ApplicationContext.Current.Details.ReceiverId.HasValue)
return;
IPacketFrame frame;
PacketFrameParseResult result = ParseFrame(args, out frame);
if (result == PacketFrameParseResult.Success)
{
ApplicationContext.Current.Details.BluetoothPacketCount++;
}
short rssi = args.RawSignalStrengthInDBm;
string message = FormatPacketForDisplay(args, args.AdvertisementType, rssi, frame, result);
if (BluetoothPacketReceived != null)
{
BluetoothPacketReceived.Invoke(this, new BluetoothPacketReceivedEventArgs(message, result, frame, rssi));
}
}
catch (Exception ex)
{
if (ex.InnerException is Exceptions.PacketFrameParseException && (ex.InnerException as Exceptions.PacketFrameParseException).Result == PacketFrameParseResult.InvalidData)
{
// noop
}
else
{
Logger.Log(LogLevel.Warning, "BLE listener caught bluetooth packet error: {0}", ex);
if (BluetoothPacketError != null)
{
BluetoothPacketError.Invoke(this, new BluetoothPacketErrorEventArgs(ex));
}
}
}
}
您可以在此处看到整个托管回调都包装在try catch中并且不会重新抛出,因此我不确定是否还有其他方法可以防止本地异常导致应用程序崩溃。
当前的想法基于:RtlpEnterCriticalSectionContended
是一个并行事件处理程序,本机端正在引发该处理程序,并且它在同一线程中引发了一个新事件,而先前的处理程序仍在从中执行以前的活动?
那么这是导致崩溃的关键部分的竞争条件吗?
编辑8:为了验证这一理论,我用read + push替换了并发队列中的接收内容,从而使托管代码能够尽快退出事件处理程序。 然后,从并发队列中读取一个单独的线程,以执行我的应用程序端处理。 最初,我认为这已解决了该问题,因为该应用程序主动运行(监听)大约15个小时,但是今天早晨它又以相同的症状崩溃了。
编辑8:根据注释中的建议,我们试图确保在接收完成之前的一站之后,我们不会处置/ GC观察者。 为此,我们使用TaskCompletionSource来履行诺言,订阅Stopped事件,以便我们等待完成源任务,该任务只有在Stopped事件触发时才有结果集。
我们还在StopAsync和Received中都使用了一个锁(Monitor.Enter),以确保两者不能并行运行。 这似乎降低了系统处理事件的速度,如果BLE数据包并行到达,则事件处理才有意义。 更新的代码如下:
if ((DateTime.Now - this.LastStartedTimestamp).TotalSeconds > 60)
{
if (this.LastStopReason != BluetoothWatcherStopReason.DeviceCharacteristicWorker)
{
Logger.Log(LogLevel.Debug, "Stopping bluetooth watcher...");
// restart watcher every 10 mins
await this.StopAsync(BluetoothWatcherStopReason.AutomaticRestart);
//start again if automatic restart
Logger.Log(LogLevel.Debug, "Starting bluetooth watcher...");
this.Start(this.testMode);
Logger.Log(LogLevel.Debug, "Started bluetooth watcher");
this.LastStartedTimestamp = DateTime.Now;
}
}
private void Watcher_Stopped(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementWatcherStoppedEventArgs args)
{
string error = args.Error.ToString();
Logger.Log(LogLevel.Warning, string.Format("BLE listening stopped because {0}...", error));
LastError = args.Error;
if (BluetoothWatcherStopped != null)
{
BluetoothWatcherStopped.Invoke(sender, args);
}
}
public class ReceivedBluetoothAdvertismentPacketItem
{
public DateTime Timestamp { get; set; }
public BluetoothLEAdvertisementType Type { get; set; }
public byte[] Buffer { get; set; }
public short Rssi { get; set; }
}
ConcurrentQueue<ReceivedBluetoothAdvertismentPacketItem> BluetoothPacketsReceivedQueue = new ConcurrentQueue<ReceivedBluetoothAdvertismentPacketItem>();
private void Watcher_Received(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementReceivedEventArgs args)
{
bool lockWasTaken = false;
try
{
//this prevents stop until we're exiting Received
Monitor.Enter(BluetoothWatcherEventSynchronisation, ref lockWasTaken);
if (!lockWasTaken)
{
return;
}
//we won't queue packets until registered
if (!ApplicationContext.Current.ReceiverDetails.ReceiverId.HasValue)
return;
BluetoothLEAdvertisementType type = args.AdvertisementType;
byte[] buffer = GetManufacturerData(args.Advertisement);
short rssi = args.RawSignalStrengthInDBm;
BluetoothPacketsReceivedQueue.Enqueue(new ReceivedBluetoothAdvertismentPacketItem
{
Timestamp = DateTime.UtcNow,
Type = type,
Rssi = rssi,
Buffer = buffer
});
ApplicationContext.Current.ReceiverDetails.UnprocessedQueueLength = BluetoothPacketsReceivedQueue.Count;
}
catch (Exception ex)
{
Logger.Log(LogLevel.Warning, "BLE listener caught bluetooth packet error: {0}", ex);
if (BluetoothPacketError != null)
{
BluetoothPacketError.Invoke(this, new BluetoothPacketErrorEventArgs(ex));
}
}
finally
{
if (lockWasTaken)
{
Monitor.Exit(BluetoothWatcherEventSynchronisation);
}
}
}
public BluetoothWatcherStopReason LastStopReason { get; private set; } = BluetoothWatcherStopReason.Unknown;
private object BluetoothWatcherEventSynchronisation = new object();
public Task<BluetoothWatcherStopReason> StopAsync(BluetoothWatcherStopReason reason)
{
var promise = new TaskCompletionSource<BluetoothWatcherStopReason>();
if (bluetoothAdvertisementWatcher != null)
{
LastStopReason = reason;
UpdateBluetoothStatusInReceiverModel(BluetoothLEAdvertisementWatcherStatus.Stopped); //actually stopping but we lie
bool lockWasTaken = false;
try
{
Monitor.Enter(BluetoothWatcherEventSynchronisation, ref lockWasTaken);
{
bluetoothAdvertisementWatcher.Received -= Watcher_Received;
bluetoothAdvertisementWatcher.Stopped += (sender, args) =>
{
// clean up
if (bluetoothAdvertisementWatcher != null)
{
bluetoothAdvertisementWatcher.Stopped -= Watcher_Stopped;
bluetoothAdvertisementWatcher = null;
}
//notify continuation
promise.SetResult(reason);
};
bluetoothAdvertisementWatcher.Stop();
}
}
finally
{
if (lockWasTaken)
{
Monitor.Exit(BluetoothWatcherEventSynchronisation);
}
}
}
base.Stop();
return promise.Task;
}
继这些更改之后,Windows.Devices.Bluetooth本机程序集(如上所述)中仍然发生相同的崩溃
编辑9:我删除了自动定期启动/停止功能,现在该应用稳定了超过36个小时而没有崩溃。因此,此流中的某些内容导致崩溃。我们最初添加该功能是为了解决与广告观察程序有关的问题,但该问题会在一段时间后停止,因此,如果可以解决,我们希望保留该问题。
if语句if ((DateTime.Now - this.LastStartedTimestamp).TotalSeconds > 60)
(和块)当前已注释。
我在此处打开了通用Windows的错误:https://wpdev.uservoice.com/forums/110705-universal-windows-platform/suggestions/37623343-bluetoothleadvertismentwatcher-advertismentreceiv