Select text to annotate, Click play in YouTube to begin
NICKOLAS: Oh, my goodness, everybody. Thanks so much for being here. It is so, so good to be on stage in person again and I have missed this so much. And I think we have all. Like Penelope said, my name is Nick Means, I work at SIM, empowering security and engineering teams to make workflows that fit the way they work. We're hiring. Please say hi.
And thanks to everyone at SIM for holding down the fort so I could be here. And I host a podcast, managing up. If you're a leader, official title or not, this show is for you. I learn something literally every time I record, and I have stickers. Come get one. And I want to start off with a brief content warning. This talk contains stories of plane crashes. One pretty vivid. If you're a nervous flyer, this might not be the talk for you.
Now, if you follow me on Twitter, you might see I'm occasionally in a plane spotting. Especially when travelling. I love new plane like that one weird one at London city airport. It has four engines, but only carries a hundred people. And the one hanging out on London Heathrow. But I like identifying planes in the you are a, and one
of the first I learned to spot was the Boeing 737. It's the best selling commercial aircraft of all time, it's everywhere. Once you know the trick, it's incredibly easy to identify in the air. It has no doors over the landing gear, if it's too big to be a regional, it's a 737. You can see how
the gears swing out from the wheel shaped cavities in the center of the fuselage. Why am I talking about plane spotting? Well, there's an interesting reason that the 737 doesn't have rear landing gear doors and that's everything to do with the problems with the 737 MAX. We got the first hint that there was a problem with this plane with the crash of flight 610 off the coast of Indonesia, October 29th, 2018. This was PK 7QP that would operate
in 2019. This plane was at Boeing's delivery facility, August 2018, just before it left on the final delivery to Indonesia. That means this plane was just over 2 months old on the day of the accident. Basically brand new. The captain was Bhavye Suneja, an experienced pilot with over 6,000 hours of flying experience, most in the cockpit of a 737.
And then Harvino, just one name. Fairly common practice in Indonesia. Almost as many hours as the pilot. And scheduled domestic service from Jakarta to Indonesia. The easiest way to tell the story of Lion Air is through data. This is from the flight data recorder from the accident report.
And it tells a really clear story. And let me walk you through it. First, get you oriented, the timestamps are 1 minute and 19 seconds apart. For some reason, they divided this into 10 equal parts instead of the 12 minutes that the flight roughly lasted. Now more legible labels so we can November gate this chart. Start with the moment the plane gets airborne. A couple things immediately indicate that the plane's got a problem.
First, as soon as the plane is in the air, the pilot's stick shaker starts shaking. This is what it sounds like. It literally shakes the yolk that the pilot is hanging on to. It is the most urgent warning indicator on the flight deck. It's there to get the pilot's attention when the plane is about to stall. In this context, it's nothing to do with the engines, it's the wings and the ability to generate lift. When it's too slow or too steep, there are
vortices, disrupting the air flow. This is bad in the airplane, and when the plane sense this is, it kicks on the stick shaker. But the plane has just lifted off the ground and it's climbing smoothly. Why does the plane think it's about to stall? Even stranger, why is it only the pilot's stick shaker going off? You would expect both to be shaking.
And the answer to both questions is a few lines down the smart. The angle of attack indicated on the pilot's instruments is 20 degrees later than the co pilots instruments. What is angle of attack? That's the angle at which the plane is moving through the air and doesn't always follow the nose. Approaching a stall, the plane might be moving more perpendicular parallel to the ground. Now, the plane senses angle of attack using a vein on
each side of the aircraft with each vein feeding the instruments on that side of the flight deck. On flight 610, the pilot's was malfunctioning, reading 20 degrees steeper than the copilot. This agreement starts on the ground and lasts the entire flight. And so, when it does take off, the plane immediately thinks that it's stalling. About 2 minutes into the flight, Harvino is asking for clearance to a holding point. They ask for a
reason, to which the response, we have a flight control problem. But doesn't declare an emergency. A minute later, having reached their hold point, the captain is worried about the extension of the wing to generate at lower speeds. He had been ignoring them because of everything else going on. But they're flying too fast to have the flaps extended and retracts them. Almost immediately the plane plunges 700 feet un commanded. Now, if you're a rollercoaster fan,
this is three times the main drop of a modern hypercoaster. He finds himself pulling back on a very suddenly heavy yolk and has an idea what's happened. The plane is out of trim. Sure enough, there's a nose down trim reflected in the data. What is trim? Let's look at tail of the 737. At the back of the plane is the mini wing, the horizontal stabilizer and -- the back is
the elevator. That's what responds when the pilot pulls or pushes on the yolk to make changes to the pitch. It would be exhausting to use it for the entire flight. That's where trim comes in. Looking at the front of the horizontal stabilizer you can see this metal track, this entire mini wing can angle up or down. And this is what trim adjusts. It serves as sort of cruise control for climbs
and descents. For some reason, the auto trim had made a dramatic adjustment to pitch, the nose of the plane down, seen here. And there's a trim adjustment to pull it down. But the auto trim pulls the nose back down. There's an unwritten rule in aviation when you make a change to the configuration of the plane and the plane doesn't do something you understand, you undo the change.
That's what he does. He doesn't understand why auto trim kicked in, but extends the flaps again and hopes that the auto trim will stop. And sure enough, it does. There are a few routine adjustments, but no more dramatic drops. But he's worried about the flaps. He's planning on completing this flight. So, he retracts the flaps again. And almost immediately, he's fighting auto trim again.
About this time Harvino asks air traffic control for a return to Jakarta. There's a few things that are odd. The captain is proceeding as if this is a normal flight. Second, the controller doesn't ask flight 610 if it wants to declare an emergency. That's a question a controller should ask you. Instead of getting airplanes out of the way, air traffic controller keeps giving them
turn around other traffic that complicates the work of keeping the plane in the air. He fought the plane for 6 minutes while Harvino scoured the flight manual for something, anything, that what was going on and fix it. And he would counter with an equal burst of trim. It would average out mostly to even over 6 minutes and the with altitude. It would
have been terrifying to be in the back of the plane as it went up and down, but he's keeping the plane more or less at 5,000 fleet. Not sure what else to do, he gives the controls over to Harvino so he can look through the flight manual and see if he can find anything that will help. But he neglects to tell Harvino what he's been doing to keep the plane level. The auto trim continues to activate, he counters, but not nearly enough to counter
the auto trim. And less than 30 seconds later, on the cockpit voice controller, he's reciting from the Qur'an. Flight 610 plunged 5,000 feet in 15 seconds, killing all 189 souls on board.
An absolute tragedy. The world understandably wanted to know what happened. But the other airlines flying the 737 MAX needed to know what happened and if it could possibly happen to their brand new 737 MAXs as well. So, 8 days later, Boeing would send the first message to the plane's operators. Doesn't really say much, though. Basically, just that the early information about lion 610 indicated there was an un commanded nose down trim as the result of a malfunctioning attack sensor.
One problem, there's no documented system on the 737 MAX by which a malfunctioning attack sensor could have nose down trim. Doesn't exist. This bulletin doesn't clear that up at all. Boeing got so many questions that four days later they sent out another operator message. Now, this message contained the first public acknowledgment of the new infamous Maneuvering Characteristics Augmentation System, MCAS. This was responsible for the air on 610.
If you have followed the story at all, you have probably heard of it. So, what is MCAS? To answer that, we need to know about the history of the 737. 737 was launched by Boeing in 1967, 54 years ago. It's an old plane. Commercial aviation was young, and they wanted to market beyond the bigger airports. They needed to fly into smaller
fields with no jet bridges. They were built low to the ground to load, and you been load baggage from the ground without a conveyor or a stepladder. As you may have figured out, this is why the 737 doesn't have rear landing gear doors. There just isn't room. It's too low to the ground. This was a problem as engine technology evolved. In the early 1980s to upgrade the engines,
they had to find a way to fit the engines under the wings. This strange looking engine inlet was the result. The only way to make it fit. They used it in the '90s as well. The 737 next gen squeezes a flightily more efficient engine under the wing, using similar but not as dramatically shaped inlets. But it went from engineering challenge to real problem in 2011.
A year prior Airbus introduced the A320neo. It's the 737's closest competitor, carrying the same 187 passengers along the same routes. It was the first major revision of the A320 since the launch. Had a much higher bypass ratio that resulted in nearly a 20% fuel savings over the older
A320 and critically, the 737 next gen. Boeing long disregarded the threat from the Airbus. Boeing was planning on designing a new plane from scratch for the 180 seat market and thought that they would buy the 737 next gen until it was ready. The CEO got a rude awakening from American CEO. American was going to buy 400 planes to replace
the McDonnell Douglas A380s. The modern fleet was all Boeing. That was until he was called to let him know that the first 200 would be a mix of A320s and A320neos from Airbus. It was a kick in the pants. But Boeing could compete for the other half of the order. They wanted the same fuel economy and on the same 5 year timeline that Airbus promised. To fully
understand Boeing's respondent, we have to go back to 1997 when they acquired McDonnell Douglas. Boeing had been an engineering driven company. McDonnell Douglas, on the other hand, was led by Harry Stonecipher, a graduate of Jack Welch's GE, maximizing shareholder value. Came into Boeing as President and CEO shortly after the merger and was quickly . They were much more focused on
margins and stock price. And moved from Seattle to Chicago. He did this as a means of culture change. He wanted to give execs more insulation from the engineers that might push back on those decisions. When he resigned in 2003, Stonecipher had this to say in an interview. When people say I changed the culture of Boeing, that was the intent. It's run like a business rather
than a great engineering firm. Now, that sounds rather innocuous on the service. But this was a growing win at all costs environment inside of Boeing. His resignation was a result of this. Forced out by stolen documents and procurement government issues on his watch. After the ethical lapse, the next was cut from the same GE cloth. Suffering from manufacturing delays and battery
problems, the 737 was the only plane that Boeing was actually delivering in the early 2010s. Now, Boeing's stock price was down at this point because of the problems with the 787. American's threatened defection to Airbus might inspire other airlines to buy from Airbus as well. He was not going to let this happen no matter what it took. And so, 3 months later, the 737 MAX was born. Designing it was a frantic project with engineers working at double their normal space.
Made some aerodynamic improvements and spec’d the A320 neo, matching it on fuel efficiency. American ordered 737 MAXs to fill the rest of the order but with one condition. The Max had to be type rated with the 737 next gen. Pilots wouldn't need expensive simulator to fly the max. Southwest
insisted on a $1 million rebate per plane if pilots needed simulator training to fly it. They get the efficiency from a higher bypass ratio. There's more space around the core of the jet engine allowing more air to flow through it, generating more thrust. But they are bigger. A lot bigger. Common type rating wasn't a challenge for Airbus, swap in the new engine
with few other major changes. But because of how low the 737 sits to the ground, it wasn't nearly so straightforward for Boeing. Even with the sculpted inlet, it was too large to fit under the 737 wing. Boeing worked around it. Instead of fitting it below the wing, they moved it in front of the wing. It was a great work around, but it had drawbacks.
And discovered in early wind tunnel testing, climbing a steep banked turn, the pressure the pilot felt on the yolk didn't match earlier 737 models because of the extra lift generated bit placement and the power of the engines. Now, this is a situation that a passenger plane would almost never be in, and it has to match with the previous generation of the plane for a common type rating.
Some looked for an aerodynamic fix, physical changes to the plane. That would delay by months. They added a software solution, the KC46, MCAS. On the KC46, it is used to keep handling consistent. On the Max, it would adjust 0.6 degrees nose down in that one specific scenario just enough
to mimic the control feel of prior 737s. It was at this point that Boeing made a critical decision. The FAA long delegated non critical parts of the certification to employee manufacturer's itself. Focused on digging into critical safety systems. Because of how rarely MCAS was expected to kick in, it was hazardous, but not catastrophic. Now, this is important. Because
that let Boeing self certify the feature. And it also let Boeing drive the operation with a single angle of attack sensor rather than redundancy for systems with catastrophic failure potential. But Boeing's test pilots found another problem in the air. We talked about stalls when the wing loses lift. In previous versions, low speed stalls are dramatic with lots of buffeting vibration
and then a quick 30 degree nose down drop once the stall happens. When the Max stalled because of the power and the placement of the engines, it was less dramatic. The nose only gently dropped by 10 degrees. This handling difference would absolutely keep the Max from a common type rating. This is common in a plane. Boeing reached for MCAS again, instead of 0.6 degrees of activation,
they would quickly add 2.5 degrees of nose down trim to get the desired 30 degree drop. Again, the solution worked great. Making the Max behave almost like previous generations of the plane. Because MCAS was declared non critical and only at low speeds when pilots had plenty of time to react, Boeing didn't update the analysis to the FAA as part of the plane's certification
despite the major new use case of MCAS. The chief pilots was behind one of the major decisions, he suggested that MCAS be intentionally omitted from the 737 flight manual that it operated in the background and pilots would never interact with it. And the FAA agreed with the logic and allowed the omission. But the FAA's agreement didn't take into account the new low speed stall activation scenario because Boeing hadn't updated the analysis.
And something seemingly nobody took into account was the possibility of MCAS activating repeatedly. The maximum adjustment of the trim is 4.7 degrees. It would take just two MCAS activations to get maximum nose down trim if the pilot didn't counter it. And it would pause for 5 seconds between the manual trim adjustment before kicking in. It was assuming that a pilot would be able to spot and address the erroneous activation in 3 seconds.
The reason is because MCAS is remarkably similar to what pilots do drill for, runaway trim. The auto trim of the plane starts adjusting to one extreme and doesn't stop and continuously goes until it hits activation. The wheels start trimming. They're located on the console between the pilot and co pilot in the 737. They have the white marking on them specifically so you can see
them spinning. They would spin about 40 times in a 2.5 degree MCAS activation And even more in a runaway trim. These wheels would be sitting here spinning. And when this happens, recovery should be relatively straightforward. First the pilot turns off the two stab trim cut out switches, disabling the trim all together and reverting to complete manual control.
Second, the pilot would pull back on the yolk to get the plane flying level again. And third, the pilot uses the manual trim wheels to set back in for flight and gradually release the pressure on the yolk. We knew there was a trim problem because he repeatedly countered the adjustments with the manual nose up trim. Why didn't they hit the stab trim cutout switches? Pilot and journalist argues in his New York Times Magazine article that it's pilot training.
There are more flight decks to fill than pilots to fill them. Lion Air had to scale up and train pilots in a hurry to keep them in the sky. The assertion was that pilots trained this quickly learned by rote rather than experience and graduated knowing the steps to fly a plane, but not out of certain scenarios. The stunning 97% graduation rate of the academy
seems to back up the training. That's an absurdly high graduation rate from a pilot training. As did the fact that Indonesia with one of the greatest aviation sectors in the world has more fatalities than the global average. The situation didn't precisely match anything they had been trained in, because MCAS pulsed on and off instead of being continuous like a runaway trim event,
they didn't have the base aviation knowledge instincts to know what to do. It was repeated on Ethiopian airlines flight 302. This was after the published on MCAS. The pilot would have been aware of the situation. It meant that the stick shaker was operating the whole flight, and MCAS kicked in as soon as the flaps were retracted. The Ethiopian pilots hit the switches, but they had to turn it back on because the aerodynamic load
was so great they couldn't manually adjust the trim. They were going to turn it on, get the plane trimmed and turn it off. But in the time they tried to do that, MCAS activated again, and they crashed shortly thereafter, killing everybody on board. The second crash led to the worldwide grounding of the 737 MAX. And scenes like this, the planes parked at Boeing's plant in Washington. Why did this happen? They added
MCAS to the 737 MAX? Or the FAA's fault for not more closely certifying it? Or the MCAS failure? Or the culture change at Boeing? In her fantastic book, thinking in systems, we are given great tools to pick apart the situation using systems thinking. Meadows introduces the stocks and flows.
Stocks are the foundational units of a system, the thing you can count and measure. They move slowly, taking time to change. And flows is increase or decrease. Analogy of the water in the bathtub. The water is the stock, and the spigot is the increase of the amount of the stock, and the drain is the influence that can reduce the amount of that stock. And there are feedback loops as
well in system that affect the rate of flows. Now, this modeling rabbit hole goes deep, I will keep this simple. You will laugh about it in a minute. The stock is safety. It's hard to quantify. But the bathtub analogy makes sense here. If you think about it, there's things that make safety in the system go up and down. It's kind of an overall measurement. The clouds are the boundaries of the system, not think where it goes to, but
what increases and decreases safety in the system. Never get rid of everything that decreases it, but keep the safety high enough to make up for the decrease. Start with pilot training. We know that the safety in the system increases the quality of pilot training, and that pilot training increases safety in the system. It's a positive feedback loop. But we also know that the increasing safety of air travel has been a major factor
in increasing the amount of travel. Just the number of people flying every year. And more travel means that more pilots are needed. And that demand for pilots negatively impacts the training quality resulting in a decrease in safety in the system. The increased amount of travel also increased the number of planes needed by airlines every year. This has a couple of effects. First, a positive influence on the rate of technological advance. More planes sold is easier to recoup
the investments. And by and large, those advancements have a positive increase on safety. It created an economic environment ripe for consolidation, there were opportunities for cost savings. This caused dilution of the engineering excellence, and the safety culture made it feel safe to dial back engineering excellence just a bit in the name of profit. We know this caused a decrease in safety as well. And finally, the increase of planes needed in the
market increased competitive pressure. Boeing had to play catch up with Airbus. They had to increase design speed and felt safe doing this because of the safety in the system. But with we know this reduced design quality and decreased the safety in the system. And it made it possible for Congress to tell the FAA to increase the certification delegation. That meant missing the change to MCAS,
clearly decreasing safety. Like I said, simple. Now, if you'll notice, every feedback loop in this system is rooted in the safety already in the system. So, what does that mean. That safety caused the problems with the 737 MAX? And actually, the answer in a way is yes. In the intro to thinking and its, meadows says everyone on everything in a system
can act dutifully and rationally, yet all of these well meaning actions often lead up to a perfectly terrible result. This is what happened here. Every actor in the system borrowed a little bit of safety to optimize something else. And it is a balance if you think about it. Because the only way to achieve perfect safety in aviation is not to fly at all. Now, most of the choices were reasonable given their incentives and motivations. And yet the cumulative effect of their actions in the system
is tragedy. And this is the lesson for us. If we only pay attention to the events around us, to the action of individuals, not to the system as a whole, we'll struggle to understand why we're getting the result that is we're getting. And practical terms, if your team is struggling to ship and you start trying to get individuals on the team to work harder without understanding what about the system is slowing them down, you're just going it burn everybody out.
Or if there's an incident and you focus on who did what without paying attention to the why, you're not likely to learn anything. You'll likely repeat it again. And you don't need to be in a leadership role to have this orientation. Anyone can bring it to the table. All of life is system. And learning to see and think in systems will pay huge dividends in achieving what you want. And a positive note about the 737 MAX. Systems are interested this their survival.
During the grounding of the max, a few things happened. First, Congress and the FAA rethinking the delegation. Manufacturers cannot no longer self certify a novel design feature, and they have to approve manufacturers. This should increase the safety in the system. The crashes had an impact on design speed and quality at Boeing both directly and indirectly via
changes to FAA delegation and certification. The 737 MAX was grounded for one year and 8 months, revising the time to get it right. Taking the time they probably should have in the initial design phase. Other problems were found and fixed in the course of re certification as well, clearly improving safety. The version of MCAS on the Max can only activate once in a stall event and only if the pilot and co pilot attack sensors are in agreement.
And pilots do need simulator training to fly the 737 MAX. A positive impact on training quality further increasing safety in the system. So, the system has balanced itself. Corrected in the wake of these tragedies. Now, odds are at least one person in this room is flying home on a 737 MAX. And I don't want to leave you anxious about that. So, if that's you, don't worry. The MAX has gotten so much scrutiny at this point, it might actually
be the safest plane in the air. Now, as you go, good luck learning to see and influence the systems operating all around you. It is your path to achieve real change. Thanks so much for coming. [ Applause ] ¶
End of transcript