How to run the panic meeting

By Scot Herrick | Job Performance

Dec 03

The panic meeting is the meeting where something bad happened and no one knows what it is or what to do in order to solve the bad things happening.

The “Oh, crap!” meeting.

For example, I left work one evening and everything was fine. I walked into work the next morning and there were all sorts of people on the phone. Overnight, turning on my email, 60+ messages came into my inbox — starting with Europe. Meeting invites were in the in-box with half of them already in the past. There was another meeting invite for right now. I hopped on. Not my meeting, so I didn’t run it. On the line were about twenty people all with a tangible anxiety in their voices and not knowing what to do or even what, exactly, happened.

Yeah. THAT kind of meeting.

When you don’t know what the problem really is and how widespread it is, there are certain things you need to do. Let’s take a look.

Define what is known and unknown about the problem

The hardest part about these meetings is discovering the problem and the scale of the problem. What, exactly, happened? No use looking for solutions at this point — we don’t know what the problem is that we’re going to try and fix. Especially in this initial meeting, determine the actual problem and the scale of damage is the first, overriding concern.

One needs to get to facts. You should be asking and hearing “This is what we are seeing the problem to be: blah blah blah.” The more facts one can establish early on, the better able you are as a group to start including and excluding what the problem is in the environment. As a group, you start defining what is known about the problem. The problem is in Europe – but not Asia? That’s a fact. The problem started around noon local time? That’s a fact. The impact was only to Windows servers but not Unix servers? That’s a fact.

Fact, fact, fact until there are no facts left to determine.

What you need to do at this point is determine what is known and what is unknown. This is a list. Summarize the list.

Define what each department or area will do next to determine the cause, extent, and possible solutions to the problem

This is a bit tricky. Your job as a leader, here, is to give people something in their control to determine about the problem. The control piece is important. Uncontrolled actions cause uncontrolled communications. Uncontrolled communications can start panics. Uncontrolled actions spin more uncontrolled actions and encourages chaos. It’s Fear, Uncertainty, and Doubt all rolled up across the company.

Look at the capabilities of the people on the call and determine what they can do that is within their control to help define the problem and figure out the scale of the problem.

“George, use your XXX discovery tool to find out exactly when the reports of problems came in. Fred, check with these other divisions to see if the problem is happening there. Mary, ask AsiaPac management if there are problems there and put them on alert about our experiences here so they can report anything that shows up. Sue, check with the Help Desk to determine if they are seeing a pattern to the problems coming in. Ralph, start writing a communications about the problem that management can use for talking points with employees. John, call our vendors and get them lined up for a call to describe what we are seeing about this problem.”

See? Something specific to help nail down the definition and extent of the problem.

Give them minimal time to report back

Usually, a couple of hours. If it’s bad (think of an internal computer virus quickly spreading), maintain an open line and have people report back in as they finish their piece of the assignments. At the next meeting, get the results of the assignments to determine the problem.

Go through this process again until a problem definition and scale of the problem is determined enough so a working assumption is developed about the problem.

Determine the problem and take actions to resolve the problem

To be clear, you may not get to a root cause before you attempt to resolve the problem. Indeed, some of the attempts to solve the problem won’t work — but that is additional information to help you determine what the real problem is that you are experiencing.

Think about a sinking submarine after attacks — you don’t know if the problem is a hole in the bow, but you know you are sinking. So you try stuff while still determining the problem. Raise the bow planes — oh, they don’t work. Add some ballast. Go faster. Or slower. When you discover your bow planes don’t work, it adds to your knowledge of the problem. In the meantime, you continue to get reports from each of the departments to determine the problem.

At some point, you have enough information to figure out what the problem is and what you can do to resolve it.

In IT, there is (almost) always a back out plan. Oh, we migrated, but it is not working. You then execute your back out plan and the problem is resolved. The cause isn’t resolved, but the impact of the problem is resolved.

Determine the root cause of the problem and fix it

In one of my former employers, there was a migration of a mainframe from one part of the country to another. On opening up the new mainframe to production, it was discovered that when you did a new page on the web on their site, your customer information was replaced by someone else’s customer information. Live. Want to move $10,000? No problem if it was in the screen of the other person’s account you were looking at.

So the back out plan was executed. Problem resolved. But the root cause was not. Fortunately, the company management recognized this and went and found the root cause. In this case, a firmware version on a router was causing the problem. The firmware was replaced with the latest version — and the next cutover went smooth as silk.

But without knowing the root cause of the problem, you won’t truly fix the problem.

What ways do your “panic” meetings work? Or not work?