Critical Fallibilism and Theory of Constraints in One Analyzed Paragraph

I wrote a dense paragraph about Critical Fallibilism (CF), particularly ideas inspired by Eli Goldratt's Theory of Constraints (TOC). This article breaks it down and explains it. Here's the paragraph with small changes from the original:

People want complex solutions like correctly weighting and adding 50 factors, which are qualitatively different (e.g. price and deliciousness are hard to add), to figure out just the right credence that is further updated by any piece of evidence. Then they end up factoring in a bunch of local optima, that, in the bigger picture, have excess capacity, so those factors shouldn’t actually change the result. (The desire to always update the credence on any good or bad evidence is wrong! They don’t know you can have excess!) Besides quickly checking that many factors don’t ruin an option (giving pass/fail grades to things that are not near the borderline between pass and fail), you should focus your more detailed attention on just a few important factors, which is simpler than trying to use 50 factors.

You’ll learn the most if you analyze this paragraph yourself before reading my analysis. What’s it saying? What do the details mean, and which ones don’t you understand? What background knowledge is it relying on? What questions do you have? Which parts could you explain to someone else, and which couldn’t you? The more you think about it and write down (or speak out loud) your thoughts, the more effectively you’ll learn from this article.

The paragraph is dense, so I'll analyze in small chunks.

If you'd like to watch a video about the paragraph first, click here.

Complex Solutions

People want complex solutions

TOC advocates simple solutions and silver bullets. It says to find key issues (constraints) and high leverage changes (what changes will have large downstream effects?). We should focus our limited attention on what matters most and make a few high-impact changes. If we do many small optimizations, or do overly complex optimizations, we’ll get a worse result. Doing many and/or complex changes takes a lot of work and attention, and there's a higher risk of something going wrong. It distracts us from the key issues so we’re more likely to have mistakes where they matter most. Doing extra work can actually lower our performance. (You can read more about inherent simplicity in The Choice by Goldratt.)

Complex solutions involve arrogance. They usually don’t work well because we can’t control so many factors very well. We can’t take so many things into account, individually, and make it all work. That’s too hard. We need to understand cause and effect, and then figure out the right solution that’s easier for us to manage. We need to let cause and effect do most of the work, based on a good initial change or changes, instead of trying to manage everything ourselves (which is where complexity comes from: from trying to control or effect lots of stuff directly). When we make a change then, due to the logic of cause and effect, additional changes will happen without additional interventions by us, and we should plan out how that will work and use that to our advantage instead of making many changes ourselves.

Also, people like to act or look sophisticated. Or they generally expect that good stuff will be complex or sophisticated. This is a problem. It’s generally better to get unimpressive basics right instead of trying to look clever or deal with fancy stuff. Trying to impress others can lead away from good solutions.

50 Weighted Factors

like correctly weighting and adding 50 factors

Factors are issues, characteristics, dimensions, traits, etc. They are things we take into account and care about like how late an order is, how much profit a product line makes, how heavy a product is, how safe a product is, how long a book is, how clever an idea sounds, how elegant a wording is, what an idea can be used for, etc. There is flexibility in how to look at factors. For cars, you could use one safety factor or you could split it into multiple safety categories (e.g. front impact car crash rating, side impact crash rating, and stopping distance when braking).

50 factors is too much complexity to deal with well using quantitative approaches (with CF's binary evaluations you can handle most factors with minimal attention, and therefore handle more explicit factors, but 50 is still high). Trying to take 50 factors into account is intellectual arrogance. And it ignores constraints. Most factors are local optima that don’t matter to throughput. Let me explain:

Constraints are bottlenecks or limiting factors. They’re the part of a system that limits the throughput (aka output or success at a quantitative goal). E.g., with a factory, the goal is to produce stuff. Throughput is the amount or rate of stuff produced: the output of the factory. (Looking at a company as a whole, throughput might be completed sales not production. Producing the wrong products, which sit unsold in a warehouse, is bad.) There are some parts of a factory that are limiting total production. There are many other parts of the factory which are not limiting production. You can improve the non-limiting parts without actually getting more stuff made.

For example, there might be a quick quality control check at the end which doesn't hold up production. Doing it faster won’t get any more products made; it’ll just increase the amount of time the quality inspectors wait between having products to inspect. That might let you fire one inspector to save money (though probably not and firing people who didn't do anything wrong has downsides) but that wouldn’t increase the production of the factory.

A constraint could be a workstation with a machine that you only have one of, e.g. a heat treatment oven (that's one of the examples in The Goal by Goldratt). Your products need time in the oven which can process e.g. 10 products per hour. Leaving the oven idle actually lowers production at the factory. You can’t make up for lost time later because you don't have any extra ovens and can't fit more in the oven at once. So your factory can’t ever produce more than 10 products per hour unless you get another oven or an alternative production process. You could double your staffing, and buy more of every other tool or machine, and you’d still be limited to 10 products per hour. Increasing, improving or optimizing many other things becomes irrelevant – there is no benefit – when there’s a constraint elsewhere.

More generally, imagine creating a product as a sequence of steps. It could be a linear assembly line or involve a more complex structure where several processes feed into one step that combines parts. Regardless, there are many steps which all have to happen before a product is done. Simplifying slightly, one of those steps is the slowest – e.g. it can only make 10 products per hour. If you speed up other steps, the max throughput of the factory will still be 10 products per hour. Other steps were already going faster than the slowest step (plus a margin of error) – they already had excess speed – so making them faster doesn’t matter much. Many steps were already a lot faster than the slowest step – e.g. capable of doing 500 products per hour – so it's pointless to speed them up since they already spend a lot of time idle.

A local optima is something you can improve that seems good in some limited, local, narrow way. It seems like an improvement when you don’t look at the bigger picture. Speeding up a fast workstation that comes before (or after) a slow workstation is an example. It’s already fast enough. Faster is better in some sense. But in the bigger picture it’s not a useful change because the factory still can't produce more. So that’s a local optimization but not a global optimization.

To optimize the big picture (global optima), you need to find constraints and improve them. What is limiting throughput? What is the rest of the system waiting for? What has some kind of major downside? Improve that.

Technically, a local optima can be a global optima too – it can be good both locally and globally. The standard practice is to call something a “local” optima when we think it’s not globally important. The point is to distinguish something that is only a local optima from something that also matters to the big, global picture.

Optima means something is optimum or best. When we talk about local optima, we mean pursuing them or getting closer to them by optimizing (improving things). We don't actually reach perfection.

The paragraph chunk we’re currently analyzing also brings up weighting and adding factors. The issue with naive adding is that factors aren’t equally important. People recognize that and try to address it with weighting. So they say “This factor is twice as important as that factor, so I’ll multiply it by 2 to take that into account.” Or they decide everything should add up to 100% and then divide the 100% up into categories. It’s like when a school makes your final exam 40% of your grade, your quizzes 30%, your homework 20%, and your attendance 10%. They are weighting the factors instead of making them all equally important.

You can read about the problems with adding and weighting factors in my article Multi-Factor Decision Making Math and in The Order of Things, an article by Malcom Gladwell which criticizes the use of weighted factors to rank colleges or cars.

Qualitative Differences

[factors] which are qualitatively different (e.g. price and deliciousness are hard to add)

Suppose I’m buying food and I decide that price is worth 50%, mass 30% and deliciousness 20%. I have three factors and I have weightings. How do I add them? (These factors are not very realistic, but don’t worry about that. They’re meant to be close enough to real decision making to be understandable, but without worrying about the complexity of real food purchasing. Also, non-linear weightings could be used but that’s harder and more complicated, and people don't usually do it, and it wouldn’t fix some of the fundamental problems.)

This watermelon costs $6, has 1kg of mass, and looks pretty delicious. Although I decided on weightings, I still have no idea how to add those three factors. I can multiply by the weightings to get $3, 0.3kg and not very delicious. But how can dollars, kilograms and deliciousness be added together? They’re different units. In other words, they’re talking about different dimensions. They’re qualitatively different factors.

People commonly draw an important distinction between qualitative and quantitative differences. What does it mean?

Quantitatively different factors differ in quantity. They differ by amount or degree. E.g. $5 and $7 are quantitatively different. They are different amounts of the same thing. They can be added or subtracted without difficulty because they’re on the same spectrum.

Qualitatively different factors are different things, so you can’t straightforwardly add them. They differ in quality – meaning e.g. that they are different types of things or go in different categories, or they’re from different dimensions or use different units. Length, time, mass, price, deliciousness, cleverness, elegance and temperature are examples of different dimensions. Units are ways of measuring or quantifying within a dimension, e.g. inches and meters are both units that measure length. Some dimensions, like deliciousness, are hard to measure numerically. Some dimensions, like beauty, refer to at least one dimension, but possibly more: you could differentiate multiple types of beauty.

An “apples to oranges” comparison involves a qualitative difference and is well known to be problematic, while an “apples to apples” comparison involves a quantitative difference and works better.

To add qualitatively different factors, basically what you have to do is convert them to things that you can add. E.g. you convert every factor to a score on a 0-100 scale for how good it is. Then, when they’re all amounts of goodness on the same scale, you can add them together (probably using weighting by importance).

This process is hard and problematic – so much so that it basically doesn’t work. One problem is how to decide on the correct weights. Another is how to decide on the correct conversion to amounts of goodness. Another problem is that there are different types (dimensions) of goodness, so even after converting you're often still dealing with different dimensions. I wrote about this in Multi-Factor Decision Making Math. In order to solve this problem for my philosophy, Critical Fallibilism, I developed an alternative approach which doesn’t need to add qualitatively different factors.

Credences

to figure out just the right credence

A credence is a degree of a belief. It’s a score (commonly on a 0 to 1 scale using decimal numbers like 0.3) for how good an idea is. It basically says your confidence in the idea. It can say how true or how likely to be true you think the idea is. Credences are advocated by Bayesians who update them using Bayesian probability math. They're also advocated by others who update them with different math or non-mathematically.

Updating a credence means increasing or decreasing it to take into account new arguments or evidence. The goal of updating is to learn from new information and change your mind rationally.

Bayes’ formula itself is pretty simple and uncontroversial from a basic mathematical perspective. Controversies come up when trying to apply a Bayesian approach to more complex issues like decision making, probability theory or epistemology (the philosophy of knowledge). The issues about deciding on correct weights and doing conversions between dimensions are two of the problems that Bayesian decision making faces.

Another issue with Bayesianism is needing “priors” which are arbitrary assumptions made up in advance before any intelligent thinking happens (otherwise intelligence would have to be presupposed prior to Bayesianism, and Bayesianism would therefore fail as an explanation for intelligence). Over time, as you learn, the importance of the priors is intended to go down gradually as your credences get more accurate. But before infinitely much time and learning passes and you reach absolute perfection, priors still matter, often a lot. (There are common cases where the credences of ideas stay in the same ratio as the original priors did, even after many updates, so the updates aren’t doing their job effectively). So if one's priors are random or awful, it's a big problem. But if one tries to choose priors intelligently, then he needs to think intelligently about how to do that, which requires a way of thinking, which, to avoid circularity, can’t be the Bayesian system (or Bayesianism would have to be limited to, at best, a smaller and less fundamental part of thinking). You can’t need intelligent thought to decide on decent priors, and also need to already have decent priors in order to think intelligently, or you’d get stuck. If intelligence relies on the Bayesian system, including already having decent priors, then it can’t be used to choose decent priors.

Note that if intelligence isn't Bayesian, but rather something else that comes prior to doing any Bayesian thinking, then Bayesian approaches to developing artificial general intelligence won't work.

Also, the paragraph chunk didn’t merely say that they want to figure out a credence. They want just the right credence. In other words, they seek high accuracy. This conflicts with TOC’s approach of figuring out what’s important and optimizing what actually matters. Most details are just local optima that we don’t need to get just right, so we shouldn’t try to. We shouldn’t spend our attention worrying about minor details. We should find the most important issues and try to optimize those. For minor issues it's OK to aim for “good enough”.

Updating on Evidence

credence that is further updated by any piece of evidence.

Credences are “updated” when you get new information, evidence or arguments. In other words, you’re supposed to learn from evidence. You should take into account new evidence instead of ignoring it. That sounds good. But, in short, most evidence is a local optima. It’s not important. Updating your opinion on every tiny detail of new information is actually a misuse of attention and focus. It’s another case of focusing on the trees instead of the forest.

They think it’s rational and good to keep updating credences constantly, but it’s actually too much work. It’s a failure to prioritize what’s important and acknowledge resource limits like limited time, attention and energy. And in practice no one does it because it’s so impractical. (Actually people don’t really use credence calculations in their lives much at all, let alone redo the math dozens of times per day as they get new information. After many discussions, I’ve never gotten a real, complete, worked example from anyone showing the use of Bayesian math to make a normal decision like what to eat for dinner. Their literature focuses on limited special cases for their examples, like pulling colored marbles randomly from an opaque bag. But redoing the math many times per second is their ideal aspiration that they want to program an artificial intelligence to do.)

There’s arrogance here about thinking one can take into account many complex factors and get it right. Then there’s further arrogance in thinking one can keep updating over and over to get slightly improved answers. What if you make a mistake while updating? The more you keep tweaking things, the more opportunities you have for error.

We need good enough answers that use realistic amounts of effort. Bayesians see this as some kind of practical compromise and think computers, particularly with artificial intelligence software, will be superior to us because they can be given more computing power and thereby avoid this kind of compromise. But it’s not actually a compromise because non-constraints don’t need optimizing.

Increasing a local optima doesn’t increase throughput. It’s like speeding up a factory workstation that feeds into a slow workstation. Optimizations in most places don’t help with your goal. Of the 50 factors the Bayesians want to perfectly take into account, 45+ of them are good enough, with a large margin of error, and can be left alone. Small changes to them won't affect our results positively or negatively. It doesn’t matter if each of them is slightly better or worse because they’re local optima that don’t affect the global picture. Taking into account the exact value of all the local optima is not an ideal; it’s an error. Even if you ignore limits on human attention and focus, you still shouldn’t optimize local optima. Instead, you should remember what your overall goal is and figure out what will actually help with your goal.

If a local optima factor improves, there will be no increase to throughput. But any kind of weighted addition of factors method (like Bayesianism) will output a higher score (like a credence or an evaluation of how good an option is). Since throughput didn't increase, a higher score is an error, not a way of spending high effort to be precise. Also, increases that aren't beneficial are often harmful, e.g. extra stuff that isn't needed can increase clutter and get in the way. (A potential way to avoid this is by using non-linear factors that behave very differently than what Bayesians expect. E.g. if a factor has a score of 0 below 50, 1 from 50 to 500, and 0 above 500, that will behave like CF's binary factors and breakpoints, not like typical Bayesian analysis. If you also switch from addition to multiplication for combining factors, which solves a few problems and is already advocated by some academic papers about factor combining, then you're sort of at CF.)

Excess Capacity

Then they end up factoring in a bunch of local optima, that, in the bigger picture, have excess capacity,

Excess capacity means something is already more than good enough. Like a factory workstation that is much more than fast enough to keep the next workstation busy, so it has to slow down production significantly to avoid creating too many extra widgets that have to be stored somewhere.

Another example is a concrete wall that’s more than strong enough. If people bump into the wall, it repels them. It holds up any load it’s supposed to with lots of leeway (you could pile up a lot of extra weight on top and the wall would still hold). You can push on it a lot harder than is reasonable without breaking through it. It’s more than good enough. Making it stronger isn’t useful.

For any working system, which is stable or viable, which has many factors, most factors have excess capacity. This is both a technical/mathematical point and a practical point.

The practical issue is: anything that works but doesn't have excess capacity is risky. It’s near the borderline of what works. If you have many risky factors, several things will probably go wrong. The system is unlikely to work well as a whole. You’d have to get super, unrealistically lucky for a bunch of significant risks to all work out in your favor. And most important systems operate over time, so you’d have to keep getting lucky over and over.

Working complex systems need excess capacity on most factors. A broken system can have no excess capacity anywhere. But if a system actually works, that means everything has enough capacity. For it to work repeatedly over time and be fairly stable means that most factors are not very risky, or else things would go wrong more. To avoid being risky and rarely have things go wrong, a factor needs to be pretty far away from the borderline of just barely strong enough, fast enough, good enough, or something else enough. That means the factor has significant extra capacity above the minimum.

In other words, if a system works well, you can infer that most parts are more than good enough to work – they have a margin of error beyond good enough which enables them to be pretty reliable. And since a margin of error is present, most small changes can and should be ignored (because they fall within the margin of error or involve excess capacity beyond the margin of error).

Excess capacity can also be below a maximum. Too much can be bad, so we want an amount that has excess distance from both too little and too much (when they're both relevant).

In general, we should design systems with margins of error (also called tolerances, robustness or resilience) so that we don’t have to concern ourselves with minor variations. And that basically means that credences shouldn't be updated based on minor variations. We need a thinking method that doesn't translate slight changes in quantities into (slight) changes in the final score. We don't want to incentivize small changes, or reward them, when they aren't beneficial. Only genuinely beneficial changes should improve total score. But having slightly more excess capacity on a factor doesn't actually merit a higher score. So we shouldn't add up weighted factors to form credences. That's the wrong approach because it doesn't take into account excess capacity or reaching breakpoints where a factor is good enough to work.

The concept of excess capacity also applies to bad values. We could call it excess failure. If we need 500 units of something to succeed at our goal, and we have 100 units, then we have 400 units of excess failure. Slightly increasing or decreasing the number of units we have will make no difference to the overall outcome (failure). Note that this depends on the type of goal, e.g. if we're preparing food, a small meal might be better than no meal, and a slightly larger meal might be an improvement even if it's too small.

To avoid giving higher scores for small increases in excess capacity (or small decreases in excess failure), non-linear scoring must be used which gives the same score for a large range of values of the factor, so that changes to excess capacity result in the same score. This moves away from typical Bayesian analysis a lot of the way to CF's binary factor multiplication, since it means changing most of the spectrum into only two values (the good value for excess capacity and the bad value for (excess) failure) and only potentially using other values in transition areas.

A working system can have a few parts that are higher risk or which require attention. You could get lucky in just a few cases. Or you could give attention, optimization or troubleshooting effort to just a few issues when you use the system. Those few factors that have have enough excess capacity are the factors which would be beneficial to improve.

That was my more practical, intuitive argument about excess capacity. A more technical argument is explained by Eli Goldratt in The Goal, which I'll explain next.

Balanced Plants

Imagine a balanced plant: a factory designed for 100% utilization of every workstation (in simple plants, this means all workstations have equal capacity of parts they can process per hour). Many people believe this is ideal. It would be perfectly efficient because nothing is wasted. E.g. the plant might have three workstations, A, B and C, and each one can process 10 parts per hour. They work in a production line: A works on raw materials, B works on the parts produced by A, and C works on the parts produced by B. Having the same processing capacity for each workstation makes the plant “balanced” and avoids excess capacity. Is that efficient? Will it work? Will it reliably produce 10 parts per hour?

A balanced plant is a bad idea because of variance, a.k.a. statistical fluctuations. In other words, errors happen. Things go wrong. Things don’t go perfectly according to plan. If a workstation processes 10 parts per hour on average, that means in some hours it will only process 7 parts, or even 0, and in other hours it will process 13 parts or even 20.

This hypothetical plant will be inefficient. The matchstick game in The Goal illustrates how it works and why it’s bad. The game uses dice to randomize production, bowls as workstations and matchsticks as parts. Matchsticks can accumulate unevenly in early bowls and then sometimes move through the system in bursts or waves instead of having smooth, even, consistent flow of parts. Overall, the system outputs significantly less than expected (e.g. less than 3.5 parts per turn given production of 1-6 using a six-sided die). Note that this analysis is for chains of dependencies when workstations feed into others, not for independent workstations.

Balanced plants underperform because workstations don’t have the same variance at the same time. The first might have a delay when the second would have been highly productive – but because the first workstation wasn’t sending parts along, the second had to wait and waste time. And at other times the first workstation produces a lot but the second workstation is being average or slow, so the extra parts don’t get processed. And whenever some productivity is lost due to negative variance, it’s unlikely to be made up for later. Positive variance doesn’t make up for past negative variance because it's usually wasted because one work station has positive variance but another in the chain doesn't.

The technical issue is that variance plus dependencies (e.g. one workstation feeds into another, rather than them working independently) means you need excess capacity and buffers to deal with the variance.

Let’s suppose capacity on workstation B is the most expensive. So we’ll leave B alone at 10 parts/hour. But we’ll increase the capacity of A and C to 15 parts/hour each, an excess capacity of 5 parts per hour. That will help with variance. If the people working on A are slow for an hour, and produce 33% less than their max, that would still be 10 parts which is enough for B to keep working at max capacity. That’s how excess capacity helps deal with variance.

A buffer means having extra parts in front of a workstation. Suppose we keep up to 500 extra parts (of the type A makes) in front of B. Then if A has a slow day, or even shuts down for the whole day, B can keep working with no loss of productivity.

The best approach is to have buffers for constraints and excess capacity for non-constraints. The amounts to use are related. The more excess capacity A has, the smaller the buffer B needs. How much of each to use depends on how easy or inexpensive to get they are. Look at the costs of excess capacity and the costs of buffer capacity, and come up with a reasonable mix. Doing math helps too: you can model how much safety (protection from risk) you get from different mixes and look at their total costs and compare. Safety here mostly means how well workstation B (the bottleneck) is protected from losing any productivity due to external problems. If B is idle, that costs money in reduced production, so spending some money to protect against that is worthwhile. You also need to consider how valuable production is: you wouldn’t want to spend $100 on safety to prevent $10 of lost production.

Would having only buffers work? No. Suppose A has a bad day and part of the buffer in front of B is used up. What happens next? A has no excess capacity (that’s the premise: buffers only). So, on average, A and B process the same number of parts per hour. How long will it take to replenish the buffer in front of B? A long time, potentially indefinitely, depending on details like the variance function. On average, A only keeps up with B but doesn’t produce extra. The only way to replenish the buffer is with good luck (positive variance: A produces extra). Waiting for good luck to replenish the buffer doesn’t work well because bad luck will keep cancelling out your progress. And you need to replenish the buffer before large bad luck occurs again. The only reasonable way to replenish the buffer is by having A work faster than B. That means A needs the ability to go faster than B, which is excess capacity.

The buffer size is set on purpose to be the right amount to protect against bad luck, so when it’s not full that means you’re protected less, which is bad. So you want to refill the buffer fairly quickly. If the buffer's at 400 out of 500 and that's fine, then you should just set the max buffer size to 400 since there’s no need for more.

Would only excess capacity with no buffer work? Not well. Suppose we upgrade workstation A to do 100 parts/hour on average (so it generally spends 90% of the time idle because it doesn't make any extra parts that B can’t use right away, because that would be a buffer). It could still produce under 10 parts in some hour; that’d be uncommon but could still happen, and then some production from workstation B would be lost. Something could break so production for A could temporarily be zero, so then B would lose production since there’s no buffer.

So you want some excess capacity for A, e.g. it can produce 15 parts/hour, and some buffer for B, e.g. an extra 500 parts. A halts production when the buffer in front of B is full. On average, A spends around 1/3 of its time idle (or spends time working below full speed). And that’s efficient! A balanced plant is actually bad, and excess capacity is necessary because variance exists.

Instead of a balanced plant, we need a plant that handles random statistical fluctuations well and is resilient against all kinds of errors and problems. Excess capacity and buffers are two of the major tools for achieving that, and excess capacity by definition means not having a balanced plant.

You can have less excess capacity and smaller buffers if the plant is all automated because robotic machines work with less variance than people do. But they do break sometimes, so you’re still better off having some excess capacity and buffer capacity. In general in life, variance can be reduced but not eliminated.

Decision Making

How does this apply to decision making involving multiple factors? Most factors have excess capacity, just like most factory workstations do (in a factory that actually works well). What happens if you take a workstation with excess capacity and "improve" it (increase its capacity)? It has to be idle more (or work slower). You haven’t helped the factory produce more finished products. It didn’t need more excess capacity. It already had an appropriate amount, on purpose, by design.

What happens if you "improve" random, arbitrary things in your home? Generally nothing good. E.g. your table was already strong enough – it had a lot of excess capacity to hold more weight than you ever put on it – and didn’t need to be reinforced to be stronger. And "improvements" generally have downsides, e.g. reinforcing the table makes it a little heavier and bulkier (and there’s the cost of the time, effort and materials used to reinforce it).

If this isn’t intuitive, it may be because you already habitually look for factors that matter or are worth improving. E.g., you don’t think of adding extra layers to all your cups to help block leaks when you drink water. They already don’t leak. Triple-layered anti-leak cups is not a thing you would consider because it’s silly. It wouldn’t be helpful. It’d actually be bad because it’d make the cups thicker and heavier, plus it’d waste time and money, plus the added complexity gives more chances for errors (in addition to the risk of the original cup breaking, there's now a risk of some of the extra, added parts breaking, which could not merely lower the anti-leak capacity to have less excess, but also be a genuine problem, e.g. by creating jagged edges that could cut you).

In the world overall, well over 99% of factors are more than good enough and we shouldn't direct resources towards improving them. But when you look around at factors, you ignore most of them. So more than 1% of the factors you actually notice are important factors that make sense to optimize because you’re ignoring so many other factors not seeing a random sample of factors. That’s mostly good, but it can be misleading about how reality works and how common excess capacity is. And it can be misleading about whether trying to take into account every factor for Bayesian updating of credences actually makes sense, because you’re intuitively not considering a bunch of factors that shouldn’t be taken into account. When they propose a logical-mathematical system (and want to program it on computers that way) and they say “every factor”, that means literally every factor, not every reasonable, notable factor according to intelligent human judgment and intuition.

What’s the result of trying to take into account all the evidence, and weighing 50+ factors and looking for complex solutions, and updating on every new bit of information? That encourages giving attention to more factors with excess capacity than people otherwise would. It encourages optimizing local optima in ways with no overall benefit. It encourages people to increase excess capacity more than they otherwise would. Bayesianism and similar systems give bad advice and give bad incentivizes (in terms of higher credences or scores for optimizing local optima or increasing excess capacity).

When something with plenty of excess capacity gains even more excess capacity, you should not update your credence or evaluation. You shouldn’t go from a 0.8 to a 0.81 because the global picture didn’t actually improve. But the Bayesian updating system (and various other multi-factor decision making systems) encourages raising the final score for everything positive and lowering it for everything negative. Factors with excess capacity tend to be seen as having low importance, so they get low weighting multipliers, but they're still taken into account and increasing them a lot can still raise a score noticeably even though it shouldn't. Similarly, if a factor loses some excess capacity but still has plenty, then the overall score should not actually go down.

Bayesians incorrectly say the way to take into account unimportant factors is by giving them small weighting multipliers, not be disregarding changes to them unless and until they change enough to make an important (qualitative) difference (in other words, the quantity gets near or crosses a breakpoint). Bayesianism thinks it's ideal to update scores for unimportant differences and also generally views factors as either positive or negative when reality is actually more complex (typical positive factors actually become negative with too much, like how water to drink is good but too much is a flood that will drown you or even destroy cities).

The best way to deal with local optima isn't to give them low weighting factors. It's to stop adding them up at all: take note of them only when their changes cross relevant breakpoints or when they cause cause failures. Instead, use a thinking method focused on other things like global optima, throughput or succeeding at goals.

See also my previous article about how adding up multiple weighted factors is problematic.

Always Updating Credences

so those factors shouldn’t actually change the result. (The desire to always update the credence on any good or bad evidence is wrong! They don’t know you can have excess!)

Wanting to always update for any new evidence may sound optimal, but it actually leads to putting effort into improving local optima. In this approach, small changes to factors with excess capacity take up our attention and change the global evaluation.

“Always update (because we’re so rational and clever and want to maximally use evidence)” is a bad motto. A better motto is “Viable, complex systems have a lot of stability against random fluctuations; that makes most new evidence not matter since the system is stable against the small change you just found out about; therefore, usually don’t update.”

Wanting to take into account all evidence sounds rational to most people, but people don’t intuitively realize how much evidence exists and how irrelevant most of it is. Reality is full of overwhelmingly huge amounts of evidence, all over the place, every second. And most of it is within our margins of error. We should only update our views when something important changes.

Instead of wanting our ideas to be perfectly sensitive to all evidence, we should want them to be robust and resilient – which means we want our ideas to be insensitive to most small details or changes. In general, insensitivity to most small things is a positive trait that makes things stable and resilient. Ideas that are highly sensitive to evidence require constant attention and are a huge hassle. That's OK for new or speculative ideas. When an idea is if finished and in widespread use, it should be insensitive to most new evidence so it no longer requires a lot of attention and effort, and we can stop worrying about it. Any idea can still be revisited for more analysis when we got a notable, important piece of new evidence or think of a significant new argument, but this should be infrequent for most of our ideas or else our attention would be overwhelmed. In order to build complex systems and complex ideas, most of the parts or ideas can't demand a lot of attention on an ongoing basis.

Also, if you give very low importance weightings to most factors, so changes in them make only a tiny difference which you can ignore, that's flawed too. That doesn't solve this problem because those factors could potentially be important if they had a large change. Giving them a tiny enough weighting factor risks ignoring large changes even when they do matter. E.g. if there's a large enough decrease to remove all excess capacity and more, that's often extremely important. There's no way to handle this well with linear importance weightings.

For example, the strength of a concrete wall has a lot of excess capacity so you might want to give it a low weighting factor. But that factor actually has high importance not low importance: if the wall was too weak and collapsed, that would be important! A linear weighting factor can't simultaneously take into account that a factor currently has excess capacity (so small changes have low importance) but that the factor is actually important (some changes would have high importance). It's common that factors with lots of excess capacity, which could be seen as low importance, are capable of causing total failure if they changed too much, so saying they have high, medium, or low importance isn't really correct regardless of which you choose, so giving them a high, medium or low weighting factor is wrong and there is no correct (linear) weighting factor.

Binary Evaluations

Besides quickly checking that many factors don’t ruin an option (giving pass/fail grades to things that are not near the borderline between pass and fail), you should focus your more detailed attention on just a few important factors, which is simpler than trying to use 50 factors.

When evaluating things, many factors matter. But (if it’s a working, stable, viable system) most have excess capacity. How should factors with excess capacity be judged? Give them a pass/fail grade. Does it still have plenty of excess capacity? That’s a pass. Doesn’t have plenty? That might be a problem. That factor didn’t pass its initial evaluation and needs to a more detailed analysis to check if it’s OK. It needs more attention. It might affect the overall evaluation, and it’s already bad enough to be distracting.

If a system works well, most parts won’t demand your attention. You can go through and give out dozens of “pass” evaluations with almost no effort (or you can ignore most of those factors, and not realize they even exist, and things might still work out OK). That lets you focus almost all of your attention on just a few factors that make a big difference to the overall system performance and which are difficult to get excess capacity for.

For an example of quickly giving out many passing grades, imagine I have an apple. It passes on size: not too big to hold, nor too small to get some mouthfuls of food from. And it passes on color: it’s red, and not even close to brown or black. And it passes on firmness: it feels fine and isn’t even close to being mushy or rock hard. And it passes on smell: it doesn’t smell strongly and isn’t even close to smelling rotten. And it passes on being whole: it’s the right shape for an apple and no one cut it in half. And it passes on weight: it feels like a regular apple, not even close to being so light or heavy that it’d be suspicious. And it passes on having intact skin, having no visible worms, being a reasonable temperature, and many more factors that I wouldn’t normally even give conscious attention to. But I do give all these factors some subconscious attention, so if they merited a failing grade I would notice. You can pick a factor and imagine it and consider whether someone would notice that or not, e.g. the apple has a visible worm, or it's blue, or it's really heavy or really light. When I see and pick up an apple, I'd notice any of those issues even if I hadn't consciously thought to check for them.

Most factors should be more than good enough. They should have ample margins for error so your subconscious doesn't alert you to a problem. If a system isn’t like that, then it isn’t a stable, working system. For example, it could be a new thing you’re building from scratch that doesn’t work yet. In that case, it’ll have lots of “fail” grades corresponding to it not actually working yet. You can plan ahead to have excess capacity and buffers when it’s done.

Excess capacity is often inexpensive. For a complex system with 50+ parts, excess capacity better be inexpensive for most parts or else the system is going to be very expensive. I can buy an apple for under $1 that has excess capacity on dozens of factors, so if you try to imagine paying for excess capacity for each factor individually (which doesn't actually really make sense), the excess capacity would be very inexpensive.

In general, when we do projects or look at factors, some things are hard or problematic, but the majority are easy. If a project or system isn’t like that, it’s either hard or not viable. If there are 50 hard factors, it’s unrealistic to get them all right and make it work. You can deal with a few hard factors by paying extra attention to them, but if your attention is split 50 ways, and every way is difficult, then that won’t work. It’ll be chaos as you keep getting “fail” grades in different places in the system (whereas if a couple things keep breaking, and it’s usually the same few problems that you have to deal with, that’s more manageable and fixable). The only reasonable way to deal with complex reality and many factors is to do stuff with at most a few hard parts that you can focus attention on. The rest needs to be pretty easy so it won’t cause much trouble or cost too much.

We do many things that only have easy parts (e.g. microwaving something for dinner) and some things with a few hard parts. If there are many hard parts, you better have a whole team of people, a lot of resources, a good way of dividing tasks up so that no person or team has too many hard parts to deal with, and a good way of communicating and coordinating between teams. The project leader also needs a way to look at the project so, from his perspective, there are only a few hard parts that demand his attention.

Full Paragraph Analysis

With all that analysis in mind, let’s take a look at the whole paragraph again:

People want complex solutions like correctly weighting and adding 50 factors, which are qualitatively different (e.g. price and deliciousness are hard to add), to figure out just the right credence that is further updated by any piece of evidence. Then they end up factoring in a bunch of local optima, that, in the bigger picture, have excess capacity, so those factors shouldn’t actually change the result. (The desire to always update the credence on any good or bad evidence is wrong! They don’t know you can have excess!) Besides quickly checking that many factors don’t ruin an option (giving pass/fail grades to things that are not near the borderline between pass and fail), you should focus your more detailed attention on just a few important factors, which is simpler than trying to use 50 factors.

Hopefully it makes more sense now and you can see more nuances. If you analyzed the paragraph yourself before reading, you could now compare your analysis with mine. And now would be a good time to pause and think about how various ideas you read in this article relate to the paragraph.

Hopefully, the paragraph now reads as reasonably self-explanatory and understandable to you, rather than requiring further explanation.

With that context, let's talk about a few more things.

Excess capacity means excess above some amount. What amount? Whatever amount is exactly enough if everything goes to plan (with no variance). Or a margin of error can be included, and it can mean excess above the margin of error.

The way to look at quantitative factors is by finding breakpoints – differences in quantity that make a qualitative difference – and then evaluating factors by which breakpoints they pass. Often, we just find one breakpoint between good enough and not good enough (or in other words, between success and failure). Then we want the factor to be above that breakpoint plus a margin of error (to account for variance in both performance and measurement – our measurements of quantitates aren’t perfect).

Actually there are commonly two breakpoints: one related to too much and one related to too little, with a good zone in the middle. But the value of a factor is usually near at most one of those two breakpoints, so we tend to focus only on that one. In other words, too much and too little usually aren't close together, so only one will be a concern. If they are close together, there's no way to have a lot of excess capacity, because if you had a lot of excess above "too little" you'd reach "too much", so it'll be a hard factor. In that case, it'd require your attention to get the quantity to be in the narrow range between too much and too little, and keep it there, so that factor won't be very resilient to random variance.

So excess capacity means excess above (or below) a relevant breakpoint. Good enough in a factory can mean enough to provide parts to the next station fast enough to not cause delays (and enough capacity to catch up after bad luck). More generally, it means good enough not to screw up your goal, like lowering your output at whatever you’re making or getting. Good enough means good enough not to be a constraint.

Some factors are inherently qualitative. For quantitative factors, breakpoints convert them to be qualitative (success and failure are qualities not quantities – breakpoints split spectrums up into categories). Critical Fallibilism emphasizes qualitative analysis much more than Bayesianism and resolves some of the problems with qualitative analysis (like the issues with weighting factors and dimension conversion) by converting from quantities to qualities.

In conclusion, Critical Fallibilism's thinking method is less work yet more effective because, like Theory of Constraints, it figures out what matters and focuses attention on key factors. For non-key factors, we should use extra capacity to make them robust, resilient, and tolerant of some variance. That way we don’t have to worry about them and small changes in them do not result in any change to the overall result. We should recognize many issues as local but not global optima, inexpensively get them some extra capacity, and refrain from trying to optimize them. Instead of optimizing every factor, optimize the use of our attention and optimize success at our goals. As Eli Goldratt emphasized, optimization away from the constraint is wasted.

I also made a video about this paragraph.

To learn more about these topics, read my articles about Critical Fallibilism, Theory of Constraints and Multi-Factor Decision Making Math.