Shit happens.
This is perfectly normal, and depending on the size of the company your’re working for, it is the same everywhere — just the size of the fan and the maximum amount of the shitload change size.
If your job is for 15-people-ISP, it is safe to assume that while a broken network cable can be replaced as fast as everywhere, a fault in a critical under-10k euros network appliance can bring you on your knees (you have no money for real redundancy).
Working for big-big-cellular-phone-telco-$$$, you would also assume that the same fault can be managed easily, and/or that there is hardware redundancy everywhere, and that the real show-stoppers are much more rare. Well, you’re right — and this result comes obviously with a cost: more hardware, more planning, more tests, more procedures.
What escapes me is why BBCPT$$$ cannot manage people the same way it does with the services infrastructure. Having smart people working for you is cool. Smart people keep your business running, like reliable hardware does. But while a bunch of geeks can pull the rabbit out of the hat most of the time, dumbasses can’t find the hat.
What’s the value of a 100k euros DNS infrastructure when the people using it can’t /flushdns their cache or check which name servers they’re pointing to? (and by consequence wake me up at 4AM in the middle of a service migration?)
Don’t put monkey behind your keyboards. Or PEBKAC could be your next buzzword.