Academic Literature for Multi-Factor Decision Making

Table of Contents

Abstract: After writing my article Multi-Factor Decision Making Math, I found and reviewed relevant academic literature about Multi-Criterion Decision Making (MCDM). I found no criticisms of my beliefs in the literature. I found that the MCDM and epistemology literatures mutually ignore each other, to their detriment. And I determined that CF’s decision making method is a new approach that MCDM researchers are unaware of. I will summarize my review and also discuss Geographic Information Studies and normalization.


In Multi-Factor Decision Making Math, I explained mathematical approaches to decision making problems and introduced a solution which I believe is original to Critical Fallibilism (CF). My primary focus was on two things. First, I wanted to explain the topic. How do you approach a decision making problem using math? How can you look at decision making in a more formal, organized or mathematical way? Second, I wanted to criticize the approach of adding weighted factors. Most people view debates in terms of adding the weight of the evidence and arguments for each side (and subtracting for criticisms), but that’s an error.

I focused on the impossibility of adding factors from different dimensions. I explained that multiplying factors works better but is problematic too. I proposed a solution involving converting dimensions to related binary dimensions and then multiplying only binary factors. This article will discuss the academic literature for decision making math and compare existing approaches with CF’s new approach.

My MCDM Background

I wrote Multi-Factor Decision Making Math before being aware of academic literature on the same subject. I’ve now found it and there’s a lot. It’s called Multi-Criterion Decision Making (MCDM) or several other similar names.

Why didn’t I find MCDM literature earlier? I had assumed that my ideas about decision making were in my own field, philosophy (specifically epistemology). I developed the ideas while trying to improve on, in particular, Karl Popper’s philosophy of Critical Rationalism.

When I searched the philosophy literature, I didn’t find MCDM articles. Some of the most relevant books and articles that I did find were about credence, and many advocated a Bayesian approach. I disagree with Bayesians and believe they’ve failed to address some of Popper’s arguments. To learn more about credence, a short, readable article I’d recommend is The relationship between belief and credence (2020) by Elizabeth Jackson; she also has a YouTube channel. The article has many citations at the end, so you can find additional literature if you want to.

MCDM literature is generally closer to math, economics, business management and operations research than to philosophy. However, MCDM issues are relevant to epistemology, and more philosophers ought to read about and comment on MCDM.

A major connection between philosophy and MCDM is that arguments and evidence (major topics in epistemology) are multiple factors (in different dimensions) which we wish to evaluate in order to choose between multiple options (ideas, conclusions).

I had also looked for literature similar to Eli Goldratt’s work. I read books on business management by a variety of authors but they weren’t great and didn’t introduce me to MCDM.

I’ve now found that MCDM literature discusses the same basic problem that my article discusses. I reinvented some ideas that were already in the literature. For example, multiple authors had already pointed out that you can’t add factors from different dimensions, which is like adding apples to oranges. Unfortunately, the majority don’t seem to take that problem seriously.

Early on, people did consider multiplying factors as a possibility. And Chris Tofallis wrote several articles advocating multiplying over adding for combining factors in MCDM, including Add or Multiply? A Tutorial on Ranking and Choosing with Multiple Criteria (2013). I recommend that article because the writing and math is reasonably accessible.[1]

As of today (2023), the majority of people using MCDM, and scholars writing about MCDM, have ignored the problems with additively combining factors from different dimensions. They continue to use approaches based on adding weighted factors. So my lengthy attempt to tell people that you can’t add factors from different dimensions remains relevant.

I have a different background (philosophy) than MCDM scholars, so I explained things differently, so that was a worthwhile addition to the conversation. MCDM authors tend to emphasize different issues than I do, e.g. they focus more on providing mathematical methods for decision makers to use (either by hand or with software). I used math to illustrate and explain concepts rather than to create a formula to guide people’s decision making.

Note: MCDM literature often uses the word “criterion” where I use “factor” (or “sub-goal”) and “alternative” where I use “option” (or “candidate solution”). This terminology difference isn’t important but could be confusing initially.

Overall, after reviewing MCDM literature, I have no major changes to make to what I said. I don’t need to retract anything. I didn’t find anything in MCDM literature that corrects an error I made. I wish the MCDM field had more to teach me. I think I did a good job to create high quality knowledge that doesn’t need to be revised based on the academic literature. The biggest change I’d make now, if I were rewriting my article, is to add a discussion of normalization.

MCDM Literature Review

I gathered around 70 pieces of MCDM literature on my computer. When reviewing this material, I particularly looked for criticisms that apply to my ideas and for ideas similar to mine. I also looked for MCDM methods that my criticisms wouldn’t apply to. Although there are many MCDM methods, they have a lot in common with each other. I found that, because my critical arguments used fundamental concepts instead of the specific details of particular methods, they apply widely to existing MCDM methods, including methods I wasn’t aware of.

I also sought out some older material to find out about how the MCDM field started. The oldest I found was Dimensional Analysis (1922) by Percy Bridgman (a Nobel prize winning physicist). There’s little before the 1950’s. I often find older material useful because it’s more about developing the major concepts of the field rather than optimizing details. And as academia has gotten much larger, the signal to noise ratio has gotten worse. And academics now publish things they don’t actually consider important due to career incentives called “publish or perish”. So I find a random paper from the 1970’s is more likely to be good than a random paper from after 2000. (There may also be a selection bias because I’m much more likely to look at digitized papers. Maybe better, old articles are more likely to be digitized than bad, old articles. Whereas, today, basically all articles are digitized regardless of quality.)

I shared a little of my research in a forum thread. You can read that for additional information.

Here are the three best sources I found for finding out what MCDM methods exist:

  • Multi-Criteria Decision Making Methods: A Comparative Study (2000) by Evangelos Triantaphyllou is a textbook. Chapter 2 covers six MCDM methods. It has explanations of MCDM concepts too.
  • What is MCDM / MCDA? (mirror) is a webpage that includes summaries of scoring and weighting methods for many MCDM approaches. It’s from a company that makes MCDM software. The advice for how to do weighting requires using intuition but doesn’t acknowledge that or talk about how to use intuition effectively. An exception is the SMARTER method, but it’s worse than using intuition. It uses fixed weights based on your rankings and the number of criteria. If there are four criteria, then whichever one you decide is most important is always given a weighting of 0.52 regardless of how important it actually is.
  • A Handbook on Multi-Attribute Decision-Making Methods (2021) by Babak Zolghadr-Asli, Omid Bozorg-Haddad, and Hugo A. Loáiciga is a textbook. It covers one family of MCDM methods per chapter. It focuses more on providing math details than conceptual explanations. It incorrectly says (in section 2.3) to normalize data when using a weighted product method.

Giving quotes from the MCDM literature wouldn’t prove that it doesn’t contain criticisms of CF’s method, so I’ve just provided information about what information I reviewed. If I missed something, you can email elliot@criticalfallibilism.com, use my discussion forum or debate me. If you want to double check the accuracy of my claims, I suggest reviewing at least one of the three MCDM overview sources I just provided. Or if you want to do a more independent fact check, you can search for MCDM literature on Google Scholar.

My favorite thing that I found from MCDM literature is the book The Art of Problem Solving (1978) by Russell Ackoff. I’d recommend it to anyone who likes my philosophy articles.

Geographic Information Studies

The closest thing I found to a binary multiplication approach comes from Geographic Information Studies (GIS) rather than MCDM. Once I knew some search terms from MCDM, I found GIS too.

A GIS topic is making maps which convey multiple pieces of useful information, at the same time, in one map. Combining multiple pieces of information into one final result involves some of the same issues as combining multiple factors into one decision. For example, a map may show cities, elevations, country boundaries and terrain types (like forest or ocean) at the same time. Each thing you can show is a factor. It’s relatively easy to make a map for any single factor, but combining factors into one map is harder because it can get cluttered and elements from the single-factor maps can overlap each other when combined. It’s especially hard to make a map combining multiple factors that each cover the entire map, such as elevation, average annual rainfall, average summer temperature, or number of deer per square mile.

The GIS article Suitability Modeling discusses using breakpoints to convert to binary factors. E.g. you could categorize all elevations as low or high, then make a map with square tiles where every square is given one of two colors based on whether the average elevation in that square counts as low or high. You would use a breakpoint such as 300 feet to differentiate between low and high.

The motivation for using binary factors is that they’re simpler and therefore easier to work with and combine. However, the only GIS perspective I’ve found on binary factors is that they’re inferior but convenient, not desirable. So GIS’s perspective is very different than CF’s view that multiplying binary factors is optimal.

Suitability Mapping titled a section “Breaking Away from Breakpoints” and discusses how to not use breakpoints and instead move on to something better and more complicated. The article correctly says that when your data can have more possible values (e.g. 10 or infinity rather than 2), then it has more information content. They want to fit a large amount of information on maps, so they have different goals than CF.

It makes sense to have an elevation map with raw data – a spectrum of real numbers represented with a color gradient and labelled curved lines. That lets readers consider whatever breakpoints they care about. If map makers use breakpoints themselves, then readers with other goals would get less benefit from that map. That’s because breakpoints are always related to goals, not suitable for all goals.

For example, I might want to climb to the highest point on the map. If the map makers made a two-color map showing high (over 300 feet) and low, I wouldn’t be able to find the peak on the map. A map based on the wrong breakpoint often won’t work for me and my goal, so it usually makes sense for map makers be more neutral and present more raw data, rather than use opinionated breakpoints. (Map makers are always somewhat opinionated, e.g. they decide that elevation data is worth sharing while information about some other factors, like number of acorns per square mile, is omitted. Map makers try to present information that will be useful to a lot of people in our society.)

Suitability Mapping explains that you can multiply binary factors to create a map. For example, you can consider four factors for a habitat for an animal. You can then divide your map into square tiles, and then make a binary evaluation for each tile and factor. The breakpoint would be whether the habitat is suitable for the animal. E.g. a degree of slope can be converted into the binary issue of whether or not the slopes are gentle enough for that animal. A measure of how much sun a tile gets can be converted into a binary factor of whether that is enough sun for the animal. By multiplying the binary factors, you can get a 1 or 0 for each map tile, and then color it in (e.g. green for 1 to indicate suitable habitat, and red for 0 for unsuitable habitat). In this way, four quantitative factors could be converted to four binary factors which could then be multiplied and then drawn on a binary map based on tiles.

A more complicated approach involves using different colors for some or all combinations of binary factors. With 4 binary factors, you’d need 16 colors (2^4) to cover every combination. For a map to be useful to people instead of confusing, the number of different colors needs to be small (or else organized in different shades in an intuitive way).

Another way to combine binary factors is to add them up. That tells you the number of factors which got a “pass” grade. The idea is that an area which passed 4 out of 5 criteria is better than one which passed only 1. In this case, for N criteria, you need N+1 colors, which is a lot less than exponential color usage. And you can use an intuitive spectrum of colors (e.g. from dark red for the lowest numbers to orange for below average numbers, yellow for above average numbers, and green for the best numbers). This binary-adding method only works if failure at a factor can be acceptable, which I consider a confusing way to use breakpoints (they are differentiating between “good” and “bad” rather than “pass” and “fail”). This adding only works well when the factors are roughly equally important, so you have to either weight them or choose factors where an assumed weight of 1 for each factor works well. That runs into the problems I’ve explained with weighting factors from different dimensions.

Also, if you included many factors related to elevation in your analysis, that would bias the results towards tiles with good elevations. This includes different-but-correlated factors – e.g. temperature and tree density correlate with elevation. Factors may be clearly qualitatively and conceptually different yet still lack independence. Additive decision making approaches run into this problem in general. People may try to deal with it by weighting factors not just by their individual importance by also partly by how much they overlap with other factors that are being included in the analysis. If two factors overlap then they should get lower weights than either would if it were the only factor dealing with that issue. That’s problematic because it means the factor weights aren’t independent. Adding or removing a factor from consideration would change the relative weights of other factors. CF’s method of multiplying binary factors doesn’t have this problem: having many factors related to elevation is harmless other than requiring extra effort to analyze. That’s because CF’s method doesn’t weight factors (not even implicitly).

Note that with CF’s binary factor multiplication, overlapping or redundant factors don’t lead to worse conclusions.

Normalization

The biggest change I’d make now, if I were rewriting my article, is discussing normalization. Normalization means getting data onto the same scale, while weighting means adjusting it based on importance.

I treated normalization and weighting as a single issue. I assumed weightings would be chosen to both weight and normalize data simultaneously, but MCDM literature typically separates this into two steps. Separating those steps makes no fundamental difference but provides a mental model of decision making math that can be useful. And it’s worth talking about because it’s in the literature.

Normalization procedures are in widespread use but are problematic. There is MCDM literature covering the problems well, but the majority of people don’t appear to be listening to criticism.

Normalization is unnecessary when multiplying factors. Being able to skip normalization is an advantage of both binary and non-binary multiplication. Some authors use normalization with multiplicative approaches, but they shouldn’t. That may be due to habit since they mostly care about additive approaches that require normalization, so they’re used to always normalizing.

What does it mean for numbers to be on the same scale? It means they’re comparable. For values in the same dimension, it means converting them into the same units. E.g., instead of adding meters with miles, add meters with meters. If you add values in different units, the result you get (and the contribution from each factor) will depend on your arbitrary choice of units, which is bad. And adding different units allows subconscious bias (or intentional manipulation), e.g. using inches (or even nanometers) for something you care a lot about so it gets a big number.

What if you’re adding things that are in different dimensions, so they can’t be converted into the same units? E.g. one factor may be meters while another is seconds. Are meters and seconds reasonably comparable, or are meters more comparable to minutes? There’s no good answer. Even if you choose units so they both have similar numbers (e.g. 5 meters and 7 seconds), they are not actually on the same scale (meters and seconds are different scales).

Normalization typically converts each dimension (handled individually) to a 0-1 (or 0-100) scale. Using the same range of numbers for everything is seen as comparable. This is not converting to a generic goodness dimension because it’s not yet weighted by how good or important it is. The combination of normalizing and weighting is an attempt to convert to a generic goodness dimension.

A typical normalization method is to take the highest data point, set it to the highest value in the normalized range (e.g. 1), and scale everything else proportionally. There are more complicated methods, but they’re broadly worse because they distort the data more (e.g. they may violate proportionality by changing the ratio between two data points).

One of the worst problems with normalization is rank reversal: adding or removing an alternative from consideration can change which of two other alternatives has a higher score. I’ll give a rank reversal example below.

There are normalization methods for dealing with negative values and negative factors (factors where more is worse). These methods cause additional problems.

Changing normalization methods also causes problems. If an annual university ranking uses one normalization method, then changes in a later year, then the rankings do not provide a fair comparison between different years.

You can read about normalization methods and their flaws in A different approach to university rankings by Chris Tofallis.

Normalizing data by dividing by the highest value (so the highest data point becomes 1) is mathematically easy but loses information. “5 miles” provides information that a “0.3 distance score” doesn’t – e.g. it tells you how far away something is. Normalizing everything to the range 0-1 obscures conceptual meaning and breakpoints (e.g. it makes it harder to tell what’s too far away to walk to and what isn’t).

Normalizing works mathematically but it often fails conceptually. Suppose one factor is 30% of the longest distance, and another is 60% of the longest time. Assume that more of a factor is better. Does this mean that the first factor is half as good as the second? Not really. That’s basically just as arbitrary as choosing a maximum distance (500 miles) and maximum time (3 hours) and normalizing everything to fractions of those values. If you chose 6 hours instead of 3, then the time scores would be halved. These are still just values in different dimensions that aren’t actually comparable or convertible. Assigning intuitive weights after normalizing will make the final numbers better match your intuitions, but it doesn’t address the fundamental problem that you can’t convert between different dimensions.

No methods of normalizing and weighting will actually let you add apples to oranges. You can arbitrarily declare that an orange is worth two apples, which lets you add them, but you’re just making things up based on your intuition. All normalization and weighting works like that. You can see that the largest pile of apples has 50 apples, and the largest pile of oranges has 20 oranges, and then decide that 25 apples (half the maximum value) is worth 10 oranges (also half the maximum value). Or you could decide 25 apples are equivalent to 10 oranges before weighting, but then also give oranges a higher weighting factor because they taste better. But, with or without weighting, that isn’t actually a good approach.

Rank Reversal Example

Say alternative A is 8 widgets and 5 foobars, while B is 6 widgets and 10 foobars. Widgets and foobars have equal weighting. If we normalize, A scores 8/8 = 1 for widgets and 5/10 = 0.5 for foobars, yielding a total score of 1.5. B scores 6/8 = 0.75 for widgets and 10/10 = 1 for foobars, giving a total score of 1.75.

B ranks 1st and A ranks 2nd.

However, suppose we also consider a third alternative, C, with 0 widgets and 100 foobars. Now, when we normalize, A and B will both get different scores for foobars, because the highest data point is now 100 instead of 10.

Since the maximum number of widgets is unchanged, A and B will still have the same scores for widgets. But for foobars, we’ll now need to divide by 100 instead of by 10. A is 5/100 foobars, which scores 0.05 instead of the prior 0.5. B is 10/100 foobars, which scores 0.1 instead of 1.

Now A’s score is 1 + 0.05 = 1.05, while B scores 0.75 + 0.1 = 0.85, and C scores 0 + 1 = 1. A now ranks 1st, C 2nd and B 3rd.

This has two problems. First, rank reversal. Considering option C made A superior to B, rather than inferior, even though A and B are both unchanged.

Second, widgets and foobars are weighted equally, meaning they’re equally good. A has a total of 13 things (widgets and foobars summed together without normalization or weighting), while B has 16 and C has 100. The option with the least stuff won, which is counter-intuitive. It makes some sense that A could be better than C because you might want some of both (equal weighting cannot express that, which is a problem with weightings). But it doesn’t make conceptual sense for A to be better than B – why would 5 of one thing and 8 of the other be better than 6 of one and 10 of the other, when the things are equally weighted? That’s only happening due to C setting a high bar for how many foobars it’s possible to get.

Avoiding Normalization

I conclude that normalization is bad. Normalizing distorts your data in unintuitive ways. In addition to conceptual problems flaws, normalization causes significant mathematical issues. So try really hard to avoid using it. CF doesn’t normalize. Using MCDM with multiplication instead of addition also lets you avoid normalizing.

I’d also prefer to weight unnormalized data and then sum it over using normalization methods. You can design your weights based on the unnormalized data to get an answer that fits your intuitions. Instead of trying to weight factors independent of their units, as normalized approaches do, you can take into account how big the units are when deciding on weights. This will work better if all data is first converted to units that your intuition considers reasonable (e.g. you might use miles, not inches, for the distances to restaurants).

Large Group Decision Making

I contacted a few academics about my decision making math article. Here I’ll respond to a question I received.

Can the binary factor multiplication method be used by large organizations? Yes. There is no difference, in terms of philosophical principles, for decision making alone or with a group of any size. Epistemology and the fundamentals of how to think are the same either way.

But there are many practical difficulties in groups, especially large groups. I lack experience working with or in large organizations, so it’s difficult for me to provide specifics about how to use decision making methods in large groups. While I’ve read about large organizations and I’ve heard of problems like “too many meetings”, “design by committee”, “too much paperwork” or “many different stakeholders”, I don’t know enough about the practical realities in our society today. In the future I may think and write more about this. I also hope that others, who are familiar with large group settings, will contribute some explanations of the problems they face, or even come up with some solutions which apply CF ideas. You can contact me by email at elliot@criticalfallibilism.com or use my discussion forum.

Also, the MCDM literature that I’ve read broadly doesn’t comment on large group settings either. MCDM methods generally assume there’s a decision maker who uses the method. A group that’s in agreement would also work instead of a single decision maker, but I haven’t seen MCDM literature about how to handle disagreements between people nor anything about the specific problems of large organizations. To be fair, I’ve mostly looked at articles about the general concepts, not field-specific articles. E.g. there are some articles about using MCDM in healthcare settings but I haven’t read many of them.

For small groups, my basic advice is to try to work through the decision making process cooperatively. One of the main differences in a group setting is that people disagree with each other. In a small group, hopefully you can cooperate, discuss and reach agreement. And actually, when making decisions alone, you will often have ideas in your head which disagree with each other, so there’s no fundamental difference. Whether in a group or not, resolving disagreements between ideas is important.

Broadly, MCDM literature doesn’t address how to resolve disagreements between ideas (except by helping you decide which alternative is best) – e.g. if you and another person disagree about what factors are important, or what weights to assign those factors, then MCDM won’t help. MCDM methods often assume a fixed set of criteria and a fixed set of alternatives at the outset, and that inflexibility makes it harder to work with other people. And MCDM has a decision maker assign weightings (either directly as numbers, or indirectly by answering questions about priorities). That requires either a single decision maker or a group that is able to reach agreement on everything. Compromise is possible (I want a weight of 0.4 and you want 0.2, so we use 0.3) but often satisfies no one (no one thinks 0.3 is actually the correct weight). One MCDM approach to weighting is to take every pair of factors and ask the decision maker which is more important (and possibly by how much, e.g. 3x more important). Nothing about that method is designed to facilitate group decision making.

Dealing with disagreements is a rationality and philosophy issue, and it’s been a common topic of my writing. It’s basically the issue of how to have a rational, productive debate or discussion (idea trees can help).

Conclusion

Based on my review of MCDM literature, CF’s binary factor multiplication approach is an important new idea which MCDM scholars haven’t considered or criticized. Also, MCDM literature broadly ignores epistemology ideas like Karl Popper’s Critical Rationalism. Similarly, philosophers have largely ignored MCDM. That includes Bayesians who aim at formal, mathematical systems similar to what MCDM aims at. Both MCDM scholars and philosophers could learn more from each other.


  1. When discussing weightings for multiplying factors, Tofallis writes: “If the score on an attribute is given a weight (exponent) of w, this means that a 1% change in the attribute gives a w% change to the overall score. See Appendix A for a derivation.” The appendix mentions the crucial detail that this claim is an approximation which only works well in some scenarios. I think the main article text should have said it’s an approximation, so I’m providing this note to help my readers avoid being confused. ↩︎