Error Correction and AI Alignment

Table of Contents

There are calls to pause or shut down AI development. I want to stay out of the widespread, tribalist, parochial political bickering because I think those activities are unproductive. But I can use these issues for examples to apply Critical Fallibilism (CF) philosophy to. How could people approach the issues more productively? What can CF tell us about the situation?

Stop Training AI?

I read two recent articles: Pause Giant AI Experiments: An Open Letter and Pausing AI Developments Isn't Enough. We Need to Shut it All Down. I’ll refer to them as the pause article and the shut down article.

The articles both favor AI alignment. Most of what I’m going to say also applies to their opponents, and also applies to people on both sides of most other issues. It’s generic criticism related to rationality, which I think is important because most groups are bad at it. AI alignment is just a typical example. I’m not saying that the AI alignment people are more irrational than other people.

The pause article advocates a 6 month pause on training AIs more powerful than GPT-4. The shut down article calls for indefinitely shutting down AI training. The shut down would be backed up by governments “willing to destroy a rogue datacenter by airstrike” and which “[t]rack all GPUs sold” globally. The pause article says something similar: “If such a pause cannot be enacted quickly, governments should step in and institute a moratorium.”

The shut down article is more extreme than most ideas people bicker over. It says “Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.” But I won’t be focusing on the extremeness.

The reason for a pause or shut down is the belief that AIs have the potential to become smarter than humans. Then the AIs might use their superior intelligence to kill all humans (not out of malice, but due to having different values than us and not caring about humans, like how our building projects sometimes kill lots of insects and we don’t care). So they want us to have reliable control over AI goals before AIs become too smart.

I’ll begin by saying I disagree with the pause or shut down. I’m not personally neutral, but I’ll try to analyze the matter objectively. One reason I disagree is I think they have an incorrect philosophical model of what intelligence is and how it works. I also think controlling the goals of an intelligent agent has similarities to slavery. But those issues are complicated and hard to have productive discussions about. Let’s start our analysis with simpler, generic issues about rationality which help explain why disagreements on these topics aren’t reaching mutually-agreeable conclusions.

Philosophical Considerations

Here are some things a philosopher should consider:

How do they know that AIs are potentially dangerous? What are their arguments? Are their arguments written down in a comprehensive, canonical way which provides clear targets for counter-arguments? Are there several factions with different, contradictory arguments for AI risk, which should be analyzed and debated separately? If AI risk advocates are wrong, how will their errors be corrected? If they’re right, how will the errors of people who disagree (like me) be corrected? If I’m wrong to disagree, how can I learn that they’re right and change my mind? Have they debated critics? If you want to have an organized debate aimed at continuing to a conclusion, will anyone on their side debate with you? If you can’t get a debate meant to reach a conclusion, who can, and what are the criteria for who they will debate with? If debate takes place, what will the debate rules and methodology be? Will the debate use a debate methodology, written down in advance, which is designed for reaching a conclusion? If they have an article or book explaining their reasoning, and you find a mistake in it, who can you share the mistake with, and how will your claims be evaluated and dealt with?

Note that when I talk about “debate”, I mean basically anything where arguments and counter-arguments are being exchanged. Debate can happen slowly by the sides writing books, essays or papers arguing against each other, and writing responses to counter some newer literature. Debate doesn’t have to be a conversation between two specific people.

The major themes in my questions are explaining positions in writing, fallibility and error correction. Fallibility says you might be mistaken. If you’re mistaken, it’s very hard for anyone to tell you unless you share your reasoning. If others are mistaken, it’s very hard for them to learn from you unless you share your reasoning. And there should be ways that criticisms of your reasoning can change your mind or be addressed with counter-arguments.

Which critics and criticisms should be addressed? CF’s answer, in short, is all criticism should be addressed. You can’t know that a criticism is incorrect until after you refute it. My articles about Paths Forward explain how this can be done with a reasonable amount of time and effort.

Others may disagree. Fine. They can give their own answers to these questions. Which critics and criticisms do they think should be addressed? In what manner do they think criticisms should be addressed, and why is that a good policy? They can debate with me about Paths Forward and/or develop their own alternatives which they think handle these issues in better ways.

What people should not do is act according to an unstated methodology. They shouldn’t ignore some critics or criticisms for no clear, predictable, understandable, written-down-in-advance reasons. If they don’t have a methodology written down, that allows them to act on their biases; that allows them to ignore critics and criticism based on social status; and it makes it difficult for anyone to critique their methodology. If their methodology contains errors, choosing not to write it down makes error correction harder. And if their methodology is great, choosing not to write it down makes it harder for others to use it themselves, learn from it, and correct their own methodological errors.

These are generic issues which are some of the first things to consider when dealing with intellectuals making complicated claims on any topic. They are meta issues about how to approach the topic in an organized way that enables rational debate. How well do the AI training moratorium proposals do regarding these issues? Poorly, like most people with most viewpoints, including the opponents of an AI training moratorium. I think these issues related to rational debate have widespread importance in the world today, and the more concrete points I make are pretty typical and illustrative.

The Pause Article

Both articles are short and don’t provide much detail about how AIs work and what the dangers are, let alone what they’ve done to know this, debate people who disagree, and refute criticisms of their claims. The pause article has some citations:

AI systems with human-competitive intelligence can pose profound risks to society and humanity, as shown by extensive research[1]

Footnote 1 reads:

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

Bostrom, N. (2016). Superintelligence. Oxford University Press.

Bucknall, B. S., & Dori-Hacohen, S. (2022, July). Current and near-term AI as a potential existential risk factor. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (pp. 119-129).

Carlsmith, J. (2022). Is Power-Seeking AI an Existential Risk?. arXiv preprint arXiv:2206.13353.

Christian, B. (2020). The Alignment Problem: Machine Learning and human values. Norton & Company.

Cohen, M. et al. (2022). Advanced Artificial Agents Intervene in the Provision of Reward. AI Magazine43(3) (pp. 282-293).

Eloundou, T., et al. (2023). GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models.

Hendrycks, D., & Mazeika, M. (2022). X-risk Analysis for AI Research. arXiv preprint arXiv:2206.05862.

Ngo, R. (2022). The alignment problem from a deep learning perspective. arXiv preprint arXiv:2209.00626.

Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking.

Tegmark, M. (2017). Life 3.0: Being Human in the Age of Artificial Intelligence. Knopf.

Weidinger, L. et al (2021). Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.

They aren’t committing themselves to any particular set of arguments. If you point out errors in any five of these sources, they could still think they’re right. There is nothing where they say “We claim X, Y and Z … and if any of those are wrong, then we’re wrong, and we’ll have to dramatically rethink our viewpoint.” They are giving many sources which say different and sometimes incompatible things, rather than making specific claims for a critic to refute.

Instead of trying to make a single case they think is correct – perhaps with a few clearly specified alternatives and branching points around areas of uncertainty – they’re trying to say there are lots of different experts adding weight to their side of the debate. They’re looking at debate in terms of the strength of the arguments on each side instead of trying to make decisive arguments and figure out what is refuted. That’s one of the main reasons they don’t debate – weighted argument approaches are bad at reaching decisive conclusions, so it prevents them from having conclusive debates, so they find that debate isn’t very valuable and do it less. CF explains a decisive argument approach and refutes the weighted approach, e.g. in my articles Yes or No Philosophy and Multi-Factor Decision Making Math.

Also, judging by the titles, none of these cites have anything to do with their openness to debate, their debate methodology, their truth seeking methods, their approach to rationality and error correction, what they’re doing to check if they’re wrong, how they handle critics, etc. They aren’t trying to talk about rationality and how to reach conclusions about tricky topics like AI alignment. They aren’t even trying to say they already won the debate: none of the cites appear to be debates they believe they won or summaries or analysis of debates. The cites also don’t explain their preferred debate methodology. They aren’t e.g. claiming that they won the last ten debates in a row and that the outcome of these debates has become predictable and repetitive, so it’s time to start acting based on the ideas that win all the debates. Why don’t they do that or try to do it? Wouldn’t it be more persuasive than citing a bunch of experts who agree with them (when everyone knows there are also a bunch of experts who disagree with them)?

And, judging by the titles, none of the cites are about epistemology issues like what intelligence is or how knowledge is created. None address issues like whether induction or evolutionary epistemology is correct. None look like they’d argue that Karl Popper was wrong, nor explain how their claims are actually compatible with his ideas. They should be building their beliefs about AI on seriously-considered philosophical premises about intelligence, learning, knowledge, etc., but they instead focus on issues within the AI field (plus a few prior fields and premises that they like). They’re doing the very common error of focusing too much on local optima instead of checking their premises more. By not talking enough about their premises that come before the field of AI itself, which build up to their beliefs about AI, they make it hard for philosophers and many other intellectuals to engage with them. People who disagree about a premise in an earlier field that AI builds on are not being given written reasoning to analyze and potentially criticize. They aren’t writing down and citing relevant claims and reasoning about the premises behind their literature about AI.

It’s common that people specialize in a field and then only want to talk about their speciality. But they need to either learn the premises of their field or have someone else who does who deals with some critics for them. If no one is addressing critics of their premises in a way they can endorse and be satisfied with, then some AI researchers need to start becoming more interdisciplinary and learning about those premises themselves. (If they don’t know if anyone is addressing critics adequately, because they haven’t looked, then they should look.) And they should approach research about their premises with an open, unbiased mind. If it hasn’t been satisfactorily investigated and debated already, they should not have confidence in a pre-determined conclusion. They shouldn’t assume that they know in advance what they’ll conclude after they learn more. They should actually seek the truth before they start debating for any particular side. If you don’t already know of any refutation of Popper to cite and endorse, then you shouldn’t assume Popper is wrong.

The Shut Down Article

Similarly, the shut down article assumes unstated premises that I disagree with related to philosophy, intelligence, epistemology, etc. But it has no citations. It’s based on ideas which it doesn’t try to talk about, and doesn’t provide references for, so it’s not suitable for analyzing whether its conclusions are correct. It’s meant for the general public and isn’t trying to engage in rational debate with informed people. It’s fine for writing like that to exist, but it doesn’t direct interested readers to any satisfactory writing elsewhere (which I don’t think exists). And there are no signs of openness to debate. There’s no discussion of how the author knows he’s right and what he’s done to address critics. And the article doesn’t challenge people who disagree to debate or claim any past debate victories.

The shut down article also talks about human ignorance. It says basically that we don’t understand this stuff well, so we should be careful. But where’s the comprehensive analysis of what we do know and what it means? Where are the organized debates between contradictory ideas to reach conclusions instead of just saying truth-seeking is hard, let’s give up and say we don’t know? I’d understand trying really hard and not knowing, but I don’t respect not knowing after not making a heroic or even reasonable effort to know. I think the author would claim he’s made a huge effort and devoted his career to this stuff, but he should prove that by linking to organized documents presenting relevant information that he figured out. That would enable people to respond to, learn from or critique his claims.

The author of the shut down article (Eliezer Yudkowsky) has known about Karl Popper for over a decade. He dismissed Popper with brief, inaccurate comments that even got the name of Popper’s philosophy wrong. He chose not to ever study Popper, nor to find any good refutation of Popper to cite and endorse. When informed of some of his errors, he did not reply. Nor has he gotten any of his colleagues to study and engage with Popper for him. He’s content to ignore relevant, different philosophical ideas without knowing of any correct refutation of them written by anyone (or if he thinks he knows one, he doesn’t cite it to enable critics to learn from it or criticize it). If he doesn’t want to personally engage with Popper, that would be fine if he could cite a refutation of Popper that he thinks is correct or get his colleagues to do that, but he has never provided that citation. He’s content to dismiss ideas for no clear or compelling reason; he’s content to ignore rival viewpoints without saying or citing anything suitable for opponents to debate with or to give counter-arguments to. He’s content to be the guy who makes a few very brief and incorrect criticisms, based on no quotes of anything Popper actually said, and then ignores all further discussion, instead of giving a rational argument. That kind of behavior ends or prevents productive debate, and his colleagues behave similarly (but so do most of his opponents, and most people in most fields).

Closed To Debate

My conclusion is that the AI alignment people aren’t open to debate in an effective, rational way. (But again that flaw isn’t unique to them. I’m not trying to say they are comparatively worse. I’m just pointing out a big problem which I think is worth talking about because it’s so typical.)

I believe I have important philosophical knowledge that they lack which would change some of their conclusions, but I believe even if I’m correct there’s no reasonable way for me to correct them. And if I’m incorrect, there’s no reasonable way for me to learn why they’re right and change my own mind.

I believe that thousands of other people could make at least one relevant criticism of the AI alignment claims which is good enough to make the discussion better not worse. But most of them are prevented from sharing their knowledge by the lack of organized, open debate. Anyone can explain why the AI alignment claims are mistaken on their own blog and be ignored, but they can’t get the thought leaders on the other side to debate and listen.

I think this lack of organized debate aimed at reaching conclusions is one of the major problems in the world which underlies many other problems.

Not being open to debate and lacking organized writing about their beliefs doesn’t mean that AI alignment is wrong. The lack of organized writing means they aren’t providing good targets for criticism that could be engaged with without having to first ask them dozens of questions about what they believe and why. Having to ask lots of questions is especially problematic when they have no organized system set up for answering questions and they commonly ignore people who disagree with them. But none of this means their conclusions are wrong.

Why? First, their opponents generally aren’t better at these issues. Rationality is uncommon in the world today. CF’s take on rationality is unconventional. It’s not normal to write out adequately organized reasoning and be open to debate according to written criteria. That’s why I’m trying to explain and advocate these ideas which I think are new and better.

Second, even if one person or group is better at rationality than another, it doesn’t imply they’re correct about every point of disagreement. You simply can’t infer from how smart, well-educated or rational someone is to whether a particular idea they believe is true or a particular idea they criticize is false.

Poor rationality doesn’t mean they’re wrong. Instead, it means that it’s much harder to find out whether they’re wrong or not. I have some reasons that I think they’re wrong about their fears about AIs killing all humans, but those are complicated. This initial meta analysis about their (lack of) rationality is simpler, and I think it could be useful to people. I hope to inspire people to use this kind of analysis to question and challenge intellectuals. I hope that will encourage intellectuals to approach truth-seeking, debate and explaining their reasoning in better ways.

Because of the lack of rationality and debate, I consider talking about my disagreements with AI alignment somewhat pointless. The vast majority don’t listen, don’t debate, don’t have proxies debate for them, and don’t have any other methods for resolving our disagreements. There isn’t reasonable, relevant literature on the other side to engage with that addresses the kinds of issues I’d bring up in discussion. There’s a lack of thought leaders taking responsibility for answering clarifying questions. There’s no one or small number of organized, self-consistent viewpoints for me to try to refute. And the same goes for other sides too, not just the AI Alignment people, because it’s a generic problem of rationality applying to basically all sides of all issues and all sizable groups of people.

If AI was one of favorite topics, I’d probably write about it anyway. There’s nothing wrong with writing about it despite the lack of rational debate. But it isn’t a top priority for me, so I’ve only written a few things about it. It’s one of many topics I think I could contribute to, which I’d be happy to talk about if there was rational debate to participate in. But if there isn’t rational debate available, then I’d rather write about other topics like I’m doing now by discussing what rational debate is and how it should work.

I think the most important thing is to talk about the meta and methodology issues. I hope you can see why they matter. Plus, they’re getting little research and discussion from anyone else. Until people do better at rationality and debate, there’s basically no way to have productive, rational debates which reach influential conclusions on major issues, so I’d rather try to solve that problem than have low quality debates.

You can sometimes have productive discussions with individuals which don’t have much effect on the world – e.g. you might learn from one person and change your mind, but most people on your side will just ignore the arguments that changed your mind without feeling the need to write or cite counter-arguments. Or you might successfully persuade one person or even a dozen people, but thousands more just won’t care. In a better world, if you beat someone in a debate, at least one more person from their side would then be willing to debate you. If you got a dozen debate wins in a row then you’d easily get a debate with a prominent person with a larger audience paying attention, but no system resembling that exists today. Some sports like boxing work more like that: win matches and fight better people, win more and you can reach the top. Similarly, chess has a rating system, so a good enough track record can get you invited to top tournaments, plus there are open tournaments where the winners qualify for prestigious events. But debate doesn’t work that way.


Maybe you’re doubtful the situation in reality is as I describe. Maybe you think people will debate or have written criteria specifying who and what they’ll debate. Maybe you think that I haven’t provided a bunch of evidence showing that this stuff doesn’t exist.

If so, I suggest you simply go ask them. Go look for debates or written debate policies and methods. You can ask people to debate you or, if you prefer, to debate me. And go look for organized documents explaining their position and trying to be comprehensive. Go look for lists or trees showing all the arguments they’ve addressed and their counter-arguments. If you find something good, tell me.

I claim you’ll find a lot of people who won’t debate you (or me), who don’t have any written policies which provide transparency and predictability about who they’ll debate. You might also find some people, who aren’t influential, who will talk a bit. E.g. they might write a few critical replies to you on Reddit (which is a limited form of debate) then stop responding. People who will actually discuss to reach conclusions are much rarer.

If you think high quality rationality and debate must exist somewhere but you don’t know where, I suggest you ask around until you find it or become satisfied that it doesn’t exist. That’s what I’ve done. Asking me to prove a negative – that it doesn’t exist anywhere – isn’t fair. (While individuals sometimes are partial exceptions, I don’t think any significant group is an exception.)


Extraordinary claims – like that improving our AI software will lead to the extinction of humanity – require extraordinary evidence, extraordinary reasoning, and extraordinary effort to do everything you can to organize the claims, explain them, debate them, etc. But the level of organization and debate I’m seeing is poor. Other groups do poorly too (often even worse), but if you’re talking about the extinction of humanity and making big demands then you should care to do an extra good job and be extra rational and persuasive.

The people who want to stop AI training ask others to listen and be motivated by emotional appeals like their fears that their children will be killed by AIs, but they are not themselves motivated by their children’s lives to do anything special regarding rationality, debate or presenting an organized, comprehensive case for why they’re right. None of them have e.g. debated ten people while saying they’d love to debate as many as possible, because they’re so motivated. None of them is even claiming to have won ten out of ten debates, in writing, that anyone can read on the internet. None is doing his best to visibly show people that he’s right and knows what he’s talking about. None has a webpage with a list of every credible critique or rival viewpoint that he’s aware of, along with a refutation of it.

While CF talks about issues like this and has new knowledge about them, I don’t think you need to know about CF to come up with some ideas along these lines and make an effort to do some of this stuff. I think reasonable people could come up with some of this stuff on their own, but that isn’t happening. They don’t seem to be trying to visibly stand out (as better than others) in terms of rationality or debate as a way to convince anyone to listen to their extraordinary claims about the danger of AIs. I’d be pretty forgiving if they were doing their best to debate and rationally resolve the disagreements between various ideas, even if they didn’t have innovative methodology to make it work well, but I don’t see actions along those lines. There’s nothing like that in the pause or shut down articles and their citations (plus I’m familiar with their online community, and read some of their books, and I haven’t seen it there either).