People Use Weighted Factors

The weighted factor epistemology, which is criticized by Critical Fallibilism, is widespread. It’s talked about in many terms including score systems, strengths of arguments, weight of evidence, or the power of a case. People also use it intuitively or subconsciously. Here are some ways people talk about weighted factors or their equivalent:

Pro/con list (how good are the pros compared to the cons).
How strong or weak arguments are. (Also used for evidence, theories, or a case for some conclusion.)
Updating credences (or confidences, probabilities, or degrees of belief) based on evidence, arguments or information.
Scoring how good something is on a 0-1, 1-10 or 1-100 scale.
Multi-factor product reviews (consumer reports, RTINGS, college rankings, car rankings, video game rankings, etc.).
School grades use multiple weighted factors (e.g. your grade is 50% final exam, 30% midterm, 20% homework).
Weighted factor matrix. Each row is similar to a pro/con list. Each column has a positive or negative factor and a row takes one option and scores it for each factor. It’s basically a way to do multiple pro/con lists at once to look at multiple options.
Cost/benefit analysis often involves adding up benefit factors and subtracting cost factors, similar to pro/con lists.
Anything where people seek an evaluation of how good something is. How good means the degree or amount of goodness, as opposed to pass/fail or using a small number of categories (e.g. evaluating someone as Republican, Democrat or Libertarian isn’t a matter of degrees or weights; it’s discrete categories).
Any thinking process people see in terms of weights or weighing to compare or evaluate things.

Product Review Example

RTINGS’ LG C1 OLED TV Review provides a concrete example of weighted factors in action. You’ve probably seen something similar before.

8.8 Mixed Usage [overall score, derived from the below factors]
9.3 Movies
8.2 TV Shows
8.7 Sports
9.2 Video Games
8.6 HDR Movies
9.0 HDR Gaming
8.9 PC Monitor

Each score comes from combining factors using weights, and RTINGS is helpful enough to specify the formulas. For the overall score, movies are 21.9%, TV shows are 18.8%, video games are 10.4%, and PC monitor is 5.2% (and a few more are listed).

For Movies, the 9.3 score comes from 12 factors: 24.6% of the score is due to its contrast rating, 20% is local dimming, 10.8% is black uniformity and there are 9 more factors. The least important factor is 2% for 480p input (which sounds like something many people should weight at zero percent).

Video games have even more factors (26) with the lowest weighted factor (Xbox Series X, Variable Refresh Rate, which I’ll call Xbox VRR) being worth only 0.2% of the score. Since video games are 10.4% of the overall rating for the TV, Xbox VRR is worth 0.0208% of the final overall score. In other words, it’s worth around one five-thousandth of the final score. (Actually it’s worth close to double that much because Xbox VRR is also worth 0.2% of the HDR Gaming factor. However, we could also view Xbox VRR’s contribution to regular gaming and to HDR gaming as separate factors.)

The final score for the TV comes from probably over 50 factors, which helps illustrate the weighted factor mindset. People usually use fewer factors, but this illustrates a pretty strong version of the approach that many people are impressed by (they think it’s more ideal than using fewer factors). The final score directly comes from 7 factors, but each of those is determined by many sub-factors. It’s hard to count the exact number of factors involved due to overlap (some sub-factors are used in multiple factors). For example, Reflections is the biggest factor in both TV Shows and Sports viewing. However, Reflections is considered completely irrelevant to watching movies, which doesn’t make sense conceptually. Maybe they were trying not to have too many tiny factors, so they didn’t want to include Reflections as a lower-weighted factor for Movies (and they’d have to slightly lower the weighting of the current factors to make room for it since all the factor weights add to 100%).

The use of a factor to provide a very tiny portion of the overall score, such as a five-thousandth, illustrates the mindset of trying to be sensitive to small factors. The mindset is that all data is good data. Unimportant data should be given a small weighting rather than ignored. This mindset says we should be responsive to any evidence, information or arguments we have. Any tiny difference in a TV should result in a (usually tiny) change to the TV’s overall score.

On the other hand, the final score is rounded to two significant digits (8.8 out of 10), so the actual mathematical consequence of the Xbox VRR factor, via its contribution to Video Games, is usually zero. The precision of the tiny factor is lost by rounding. A different score for Xbox VRR usually has no effect on the rounded final score, but has a small chance to cause it to round differently and thus make a one-hundredth difference (50 times bigger than the factor’s actual weighting). Roughly, if the Xbox VRR score were increased from 5/10 to 6/10 for a TV, that would increase the overall score by a fifty-thousandth, and have around a 1/500 chance of causing it to round up to a one tenth higher overall score, and a 499/500 chance to leave the overall score unchanged. I think they realized that giving a score of 8.74331 out of 10 would be impractical and unhelpful. But if they think that wouldn’t work well, they should reconsider including such tiny factors in their scores (and then working against that precision by rounding). I also don’t know if they are double-rounding, which would be bad (did the overall score use 21.9% of the rounded 9.3 score for Movies, or did it use 21.9% of the un-rounded score for Movies which might have been 9.2894.)

Mindset

A better mindset is to look for key factors and breakpoints. Instead of trying to be responsive to tiny differences in every factor, we should look for ways to exclude factors as unimportant. Our goal should be to figure out a handful of key factors and focus our analysis on them, and gloss over the other factors as “good enough” (for a good product that actually works, most factors have excess capacity – they are not near a borderline).

Instead of looking at factors in terms of raw numeric scores, we should put them in conceptual categories and look for significant differences that are worth our attention. It’s important to a comparison when two products are in different categories instead of the same category, and those categories are important. Raw numbers don’t matter or mean anything by themselves. We have to interpret numbers by thinking. Attempts to leave out creative thinking, and just combine a bunch of numbers with math, are a mistaken approach.

People actually do use creative thinking when deciding on factor weightings. Weightings are made up using judgment, not measurement. Some individual factors also require judgment to score instead of measurement. So the attempt to make it seem super scientific – just facts and measurements – doesn’t work anyway. The idea of science as being objective due to focusing just on facts and measurements is actually a misconception anyway. It’s harmful because it discourages scientists from learning to deal with their biases. That viewpoint tries to avoid the need for skill at objective thinking by giving scientists less room to use their judgment. But that can’t work; judgment is needed in science, so it’s important that scientists develop good judgment and recognize that they’re using it.

Key Factors

Which factors are important varies by buyer. For some people, using the TV as a PC monitor is important (it matters much more than the 5.2% weighting its given). Many other buyers will never use it that way, so it doesn’t matter at all.

Part of the excuse for doing ratings this way is to make a one-size-fits-all review of a product. They are trying to give information that is useful for many buyers, including people who would and wouldn’t use the TV as a PC monitor. However, even if weightings made sense (which I deny), a 5.2% weighting for the PC monitor factor would not make sense for either group of buyers. There are some buyers where that factor should be around 0%, and others where it should be a much bigger amount like 50%, and not much in between. A better way to view it is that some people should take this factor into account and most should not.

Each buyer should try to figure out the key factors for them and get information on just those factors, as well as check in a general way that everything else is OK. So a good review would cover some important factors (if it covers 10 factors, that could probably include the 3 key factors for over 90% of readers) and also generally say whether the product is fine or there’s some problem to worry about outside your top factors.

Your key factors should not be determined by weighting the factors then taking the ones with the highest weightings and ignoring the rest. Instead, you should figure out which factors make some kind of qualitative difference to you.

There are lots of TVs that are pretty good, are in your price range, and have nothing awful about them. How should you pick? Pretty much everyone should care about screen size. After that, you might not care about anything and you can just buy any TV or get whatever reviewers say is good for most people (which is actually different than which one scores the highest – it’s asking for their judgment instead of what the formula outputs). But you might care about some stuff besides price, size and maybe brand. You might want an OLED not an LCD. You might want deep blacks, uniform blacks or both. You might want it to work well with your Xbox. You might want it to work well with a computer.

You can use factors which matter to you to rule out options. But you shouldn’t try to care about all factors. You should consider which ones will make some sort of specific difference in your life. For example, don’t get the TV with worse blacks if that will bother you. Or don’t get the TV that has an Xbox compatibility issue if you play Xbox. Ruling out TVs that won’t work well for you is a different approach than weighting factors, and makes more sense. Find dealbreaker issues or must-haves and use those to differentiate between products. If you don’t care enough to do that and have no opinions about any specifics, then use an expert’s own judgment about what product is good for most people or for a large group of people that you belong to.

Robust Judgments

You want to reach a conclusion in a robust, stable way, not a way that is highly sensitive to details (e.g. to small measurement or rating errors of individual factors). High sensitivity to details is bad not good. People think it’s good and seek it on purpose. But you want to reach an error-tolerant conclusion – don’t expect or strive for perfection, but instead approach things in a way where some errors are OK. That means it should be stable to changes in case an error changes something or fixing an error changes something. That way you don’t keep getting jerked around to different conclusions every time a minor error comes up. The world is full of minor errors, variance, fluctuations, etc., so we always need to approach things in ways that can tolerate that. So focus on a few key factors and evaluate all the others in a pass/fail way (good enough or broken), so that your conclusion is robust (insensitive to small changes to most factors).

Focusing on a small number of key factors lets you think about them instead of trying to combine them with weightings. It lets you pay individual attention to them, analyze them, and form opinions about them. It lets you come up with explanations for them and comparisons between them. You could never do that if you’re worrying about fifty factors. But if you focus your attention on e.g. just three factors, then you can look at tradeoffs in a nuanced way instead of with a math formula.

Key Factor Examples

For example, currently most new cars have terrible infotainment software and require you use the infotainment center. Some cars require using the infotainment center to control key functions like radio, heat and air conditioning. So the right way to choose a car is to start with the standard initial factors people look at (price and car type such as truck, sedan or SUV) and then look at infotainment systems. Rule out all the cars where the infotainment system is too awful to put up with and you may have few or just one option left. If that gets you down to zero options, you’ll have to think about what standard you can relax or consider not getting a new car (but waiting is unlikely to help with this problem in the foreseeable future).

Suppose you want to live somewhere in the contiguous US but you don’t know where. Many people would choose by a key factor like proximity to family or to a job. If those aren’t concerns, you might rule out most of the country by price, weather and wanting to be reasonably near a decent sized city (so e.g. Costco isn’t too far away and you can get high speed cable internet). You might choose between cities by which one has an IKEA or Apple Store (or at least Apple Authorized Service Provider) within driving distance. Given a lot of factors are roughly OK for many options, you should focus on a small number of specific factors that you care about rather than try to distinguish which minor factors are slightly better for options. Don’t try to figure out whether potholes should be given a weighting of 0.01% or 0.02%. Potholes are fine in most places, and if they are a significant problem in a specific city then you should actually look into it instead of having them subtract a thousandth of a point from the overall score for that location.

Suppose you want to go to college. Instead of looking at college rankings, you could look at rankings for the department you want to major in. Or instead of looking at vague overall department rankings, you could read a few papers from professors to see if you actually think they are intelligent or not. Lots of papers are terrible and some aren’t. Another sort of key factor is location – do you want to be far away from your parents, or close, or you don’t care? Are you picky about weather? Do you want to be in a big city?

You should try to figure out what specific outcomes you care about (dealbreakers and big wins) rather than trying to look at a generic, blended average of how good something is for many goals for many people. A blended average could have a dealbreaker in it and still get a good score, or could have a big win with no dealbreakers but get a lower score. Blended averages of many factors focus your attention away from what’s important.

Imagine someone buying a pet and choosing a cat, dog or bird based on rankings that say cats are 93/100 and dogs are 94/100. That would be ridiculous. Choose the type of pet you like based on key factors like whether you want to walk the pet or not and whether that type of pet is allowed where you live.

People know whether they prefer a dog or a cat much more than they know which TV, car or college would be good for them. That’s why they turn to reviews. Lots of TVs, car and colleges are “good enough” in most or all ways. This is part of why the reviewers can give bad guidance without getting a bunch of hate – all the choices are kinda similar and OK anyway, so it doesn’t matter that much if the reviews aren’t useful. They could pick a random product and say it’s the best and people often wouldn’t know the difference (or it’d be safer to just look at the brand reputation of the companies that make the products, then say the one from the best company is the best product, and unless it’s a huge flop with obvious flaws the reviewer is unlikely to have a problem).

Not Caring

If you don’t know which TV you want, and you just want some expert to look at a bunch and make a recommendation, that’s fine, but blended scores from weighted factors are still a poor system. It’d be better if they said “these TVs are all similar; here are our number 1, 2 and 3 picks based on our judgment” (and the article could explain some reasons for people who care to read more). And they could say which TVs are not fine, or which have something truly exceptional about them, and highlight those issues where something is great or awful.

The sites that publish all the weighted factors individually, not just the overall score, do give some useful information. You can look through all the factors for a few you care a lot about. You can also try to find problems/dealbreakers (are there any factors that seem important to you but get a low score for some TVs? Then you can research that issue more and you might find a major problem).

However, even scoring individual factors is often inaccurate, misleading or problematic. Like two TVs might get 7/10 blacks but for very different reasons – for one, the blacks are only a dark gray and for the other TV the blacks are blacker but aren’t uniform (they look different on different parts of the screen). You should consider which thing you care about (darkness or uniformity) or which type of flaws you won’t mind. If you don’t want to think about that, looking for a higher number won’t do a good job of making you happy. You’re just hoping the reviewer can guess the average person’s taste to know which is worse. The numbers aren’t very important to that process of a reviewer guessing your taste, and attempts to use the numbers as something more precise or scientific or ideal – rather than as a really loose approximation for people who don’t care much – are bad.

Conclusions

Lots of people see these kinds of score systems as a rational ideal which is what I dispute. As a mediocre approximation when you don’t want to put thought into something, they’re flawed but it’s not that big a deal in that case. Review sites could do better but that’s not my main concern. It’s a bigger deal when the weighted factor approach claims to be good, rational philosophy and that mindset is applied to everything.

Weighted factor thinking harms focus on the most important few factors, doesn’t provide margins of error, and doesn’t appropriately handle qualitative differences or bottlenecks. It assumes everything is just quantitative differences which can be added together. Actually, factors are mostly from (qualitatively) different dimensions and can’t be added. Weights for factors are really unit conversion factors between dimensions, but that doesn’t work because most dimensions can’t be converted between.