Random Crash and Restart

Steveyg

MOST VALUED CONTRIBUTOR
I've only ever cooked one CPU, was back in the 90s when you could hand solder a motherboard...... meaning there wasn't much space for safety mechanisms :ROFLMAO:

Interesting side note. I've never ever known a CPU to actually just die. I've got a drawer full of CPUs and every single one of them work. I overclocked my old 2700k to the edge, ran it 24/7 for about 8 years and it never put a foot wrong.

They are so much harder to kill than people realise :D
Had a 2500K running at 4.5Ghz for nearly 10 years and it's still works like new. My mate is still using that machine, I donated it to him when I moved up
 

NoddyPirate

Grand Master
@ubuysa - I think we are getting places!

After another 5 hours of trouble free operation this morning, I put the Undervolt back in. Within an hour or so the system crashed again. The latest Minidump appears to the exact same as the others to my untrained eye:


I will remove the Undervolt again now and see if I can go another 24 hours without trouble and revert to you again. Here is my revised plan:

NAME / CHARACTERISTICS / TIME / ACTIONS

Phase 1 / Utter Panic / A few Seconds / Immediately get on Ubuysa
Phase 2 / Relief when Ubuysa replies / Instantaneous / Feel much better
Phase 3 / Begin to identify most likely suspect / Likely to be Days / Sit and wait <------------------------ We are here
Phase 4A / Likely suspect turns out to not be the cause / Minutes / Return to Phase 1, else continue to 4B
Phase 4B / Reintroduce all previous suspects / Days / Sit and wait, else if crash, Return to Phase 1
Phase 5 / Final Idenitifcation / Momentary / Commit to buy Ubuysa gifts and shower him with nice forum comments

Additional questions for you:

Regarding the same event appearing with each crash - could it be possible that the Undervolt is borderline and only a small number of prcoesses are critical enough that the Undervolt barely causes them/it to fail where others are able to struglle on? I am with you in that I can't understand how an Undervolt could give the same process error each time - it may yet prove to not be the Undervolt of course - we shall see.

Could you help me understand the Bits Client and what it is - how jobs could fail and get stuck there and the significance or implications of such failures? My Bits Client list remains empty since clearing it yesterday so I'm just wondering what caused it to pile up in the first place....
 

NoddyPirate

Grand Master
@Scott - Depending on what the Lord of the Crashes says to the above - I was thinking that if the Undervolt turned out to be the culprit later and might be borderline - an interesting place to start would be to run with it and add LLC and/or Increased VRM Phase Switching to the mix and see if it stablises.

Thoughts on that? I imagine it could prove to do exactly this potentially - turn a borderline UV or OC into a stable one - as that's kinda the whole point!?

My board is currently set to AUTO LLC - which I think means it adapts to the conditions as much as it can - so it may well already be trying to compensate fully and unsuccessfully - but it sounds like you would have no issue with just setting it to the max manual setting available and see where it takes me - which seems like the place to begin? 🤔

(Don't worry Ubuysa! I am still playing softly softly catchee monkey! Just planning ahead is all! :D)
 

ubuysa

The BSOD Doctor
Regarding the same event appearing with each crash - could it be possible that the Undervolt is borderline and only a small number of prcoesses are critical enough that the Undervolt barely causes them/it to fail where others are able to struglle on? I am with you in that I can't understand how an Undervolt could give the same process error each time - it may yet prove to not be the Undervolt of course - we shall see.
That would be my thinking - if it is the undervolt, which is looking more likely. I don't understand how the failures are all the same (though I've not checked the most recent dump yet).
Could you help me understand the Bits Client and what it is - how jobs could fail and get stuck there and the significance or implications of such failures? My Bits Client list remains empty since clearing it yesterday so I'm just wondering what caused it to pile up in the first place....
I'm no expert on BITS, I don;t use it (at least I don't think I do!). I would guess deduce that they pile up because the destination either isn't available or somehow can't receive the files?

BTW. I have a better plan for you....

1. Put your underpants on your head
2. Stick two pencils up your nose
3. Say "wibble" a lot

Edit: The recent dump is identical to all the others.
 

NoddyPirate

Grand Master
BTW. I have a better plan for you....

1. Put your underpants on your head
2. Stick two pencils up your nose
3. Say "wibble" a lot
Done!

1618937545933.png
 

ubuysa

The BSOD Doctor
Why the undervolt seems to make only UWP apps fail and only in a specific way, is in the realm of chaos theory I think. The physics of CPU processing (or even RAM processing) at the molecular level isn't anywhere near as logical as we would like to think. If the undervolt is right on the edge it's possible I suppose that the way these UWP apps use the CPU might be unique enough to cause a CPU glitch where other threads don't?

I reckon it's one of those things we may never fully understand. All you can do is find the undervolt that's stable and not worry overmuch about why.

It's taught me something new.
 

NoddyPirate

Grand Master
Why the undervolt seems to make only UWP apps fail and only in a specific way, is in the realm of chaos theory I think. The physics of CPU processing (or even RAM processing) at the molecular level isn't anywhere near as logical as we would like to think. If the undervolt is right on the edge it's possible I suppose that the way these UWP apps use the CPU might be unique enough to cause a CPU glitch where other threads don't?

I reckon it's one of those things we may never fully understand. All you can do is find the undervolt that's stable and not worry overmuch about why.

It's taught me something new.
If we can both avoid crashes and cause them to recur 'on demand' so to speak - which is where I hope we can get to - then it would be great if we can narrow it down to the undervolt alone. A way to go yet in that regard though.

More learning all around potentially - and will certainly help me understand more on the OC side - if I am lucky enough to have stumbled onto a borderline undervolt then it will be potentially really useful to me to be able to use the behaviour over the last day or two to better understand how such issues present and also what might fix or mitigate them.

Every cloud has a silver lining. :)

But also if it is the Undervolt, I am kind of disappointed that it wasn't my first assumption when I first spotted a restart - because I will have used up so much of your time searching for a needle in a hay stack, while the issue was actually casued by the farmer in the barn. 🙇‍♂️

(EDIT @Scott - actually this would be a great example of how an OC that successfully restarts and survives various testing and stress tests - can still fail much later on. I think by my increased UV seeming to work fine, I just didn’t link it to the crash when they finally occurred....)
 
Last edited:

Scott

Behold The Ford Mondeo
Moderator
I can only say I've had similar before. I never dug any deeper than confirming that was the case for me and overcoming it by tweaking. I would hazard a guess that it's a shock load of some sort with a weaker signal strength being interrupted by a stronger signal. Almost like interference across the lanes of the CPU itself. That's obviously a complete guess, but it's not outwith the realms of possibility..... as that's actually how the SPECTRE hacks etc are done due to hardware proximity. When the signal strength isn't degraded there isn't an issue. Stress testing probably won't care much as it'll just log as an interrupt or fault, it won't actually hard crash as it's not trying to do anything. Different when it's the kernel not quite getting 4G reception.

Manually setting the LLC should help. Again, set at auto it'll need a certain amount of ramp to catch up. Max is fine for a quick check but I would aim to find the happy medium. If max looks OK with your peak voltages after some tinkering then it'll be right as rain.
 

NoddyPirate

Grand Master
Another full morning of trouble free operation. So it seems we are a step closer to confirming the CPU Undervolt as the primary cause. The RAM OC has been re-entered now and I'I'll see what happens before doing anything else.

A while to go yet, but just one step away now to finding a gift shop that's open for Ubuysa.

NAME / CHARACTERISTICS / TIME / ACTIONS

Phase 1 / Utter Panic / A few Seconds / Immediately get on Ubuysa - Complete
Phase 2 / Relief when Ubuysa replies / Instantaneous / Feel much better - Complete
Phase 3 / Begin to identify most likely suspect / Likely to be Days / Sit and wait - Complete
Phase 4A / Likely suspect turns out to not be the cause / Minutes / Return to Phase 1, else continue to 4B - Skipped
Phase 4B / Reintroduce all previous suspects / Days / Sit and wait, else if crash, Return to Phase 1 <--------------------------- We are here
Phase 5 / Final Idenitifcation / Momentary / Commit to buy Ubuysa gifts and shower him with nice forum comments
 

ubuysa

The BSOD Doctor
Another full morning of trouble free operation. So it seems we are a step closer to confirming the CPU Undervolt as the primary cause. The RAM OC has been re-entered now and I'I'll see what happens before doing anything else.

A while to go yet, but just one step away now to finding a gift shop that's open for Ubuysa.

NAME / CHARACTERISTICS / TIME / ACTIONS

Phase 1 / Utter Panic / A few Seconds / Immediately get on Ubuysa - Complete
Phase 2 / Relief when Ubuysa replies / Instantaneous / Feel much better - Complete
Phase 3 / Begin to identify most likely suspect / Likely to be Days / Sit and wait - Complete
Phase 4A / Likely suspect turns out to not be the cause / Minutes / Return to Phase 1, else continue to 4B - Skipped
Phase 4B / Reintroduce all previous suspects / Days / Sit and wait, else if crash, Return to Phase 1 Phase 5 / Final Idenitifcation / Momentary / Commit to buy Ubuysa gifts and shower him with nice forum comments
I think you might be overestimating my contribution a tad.

Although I said at the start that removing undervolts and overclocks is the first thing to try when dealing with BDOSs, it's actually @Scott who has directed you to the (apparent) cause.

The beauty of these forums is that we all have something to contribute and none of us have fragile egos that need pampering. Collectively we can solve most issues that don't require the PCS test bench. Long may that continue. :)
 

SpyderTracks

We love you Ukraine
I think you might be overestimating my contribution a tad.

Although I said at the start that removing undervolts and overclocks is the first thing to try when dealing with BDOSs, it's actually @Scott who has directed you to the (apparent) cause.

The beauty of these forums is that we all have something to contribute and none of us have fragile egos that need pampering. Collectively we can solve most issues that don't require the PCS test bench. Long may that continue. :)
My ego needs a lot of stroking!
 

NoddyPirate

Grand Master
I think you might be overestimating my contribution a tad.

Although I said at the start that removing undervolts and overclocks is the first thing to try when dealing with BDOSs, it's actually @Scott who has directed you to the (apparent) cause.

The beauty of these forums is that we all have something to contribute and none of us have fragile egos that need pampering. Collectively we can solve most issues that don't require the PCS test bench. Long may that continue. :)

OK, OK, but this is all you own fault:

My first post had this in it:

My best guess is that the restart might have been related to my undervolt on my OC. Yesterday I increased the undervolt amount - and it is entirely possible that I had an unstable idle if the undervolt was pushing it.

We sort of kinda dismissed this early given the nature of the logs, and @Scott might have given us exactly the info we need to put us back on the scent, and prevented me from sending you around in circles for a few weeks, but I've mentioned a few times here before that the Mods bring nothing to the place. A shower of wasters all of them. And as for that @Scott fella - don't get me started. I talk to him only to give the poor man something to do. :rolleyes: He's even worse than that @SpyderTracks bloke.

As for reading Dump Logs? How hard can it be? It's just code and letters and numbers after all. Miles and miles and miles of code. Like maybe roughly 1 or 2 x 10^9 bits of data? It can't be that hard? Why you would think someone would send you thanks for trawling through that little bit of stuff is beyond me....

(Is that OK? Or have I gone a bit too far the other way? 🤔)
 
Top