Why Die for the Donbass? Thoughts on the Invasion of Ukraine


I am writing this mostly to get some things off my chest, and also for the sake of posterity. Perhaps some others will also find my thoughts of interest.

Eight years ago when Russia invaded Ukraine the first time in their occupation of the Crimea, I was dismayed at the lack of Western response. I was sure that such timid reactions would encourage further aggression. Now it seems we are seeing such policies come home to roost. In my view, the West has neglected its moral duty in refusing to come to the assistance of the people of Ukraine. More should have been done in the leadup to deter Russian aggression, and more should have been done at the time of the invasion and afterwards to help defeat it. I think it is totally unsatisfactory for a large country to simply invade a smaller neighbour in an aggressive war and not have any other country lift a finger to provide meaningful military assistance. Sanctions will hurt Russia in the long run, but they won’t help Ukraine, and it is unclear whether they will even help to undermine Putin’s popularity, since the resulting economic problems are likely to be blamed on the West.

Foreign Military Interventions

First I want to discuss the general issue of foreign military interventions. These have gotten a bad name in the West in the past decade or so, largely a result it seems of the experiences in Iraq and Afghanistan. There are those across the political spectrum who oppose all foreign military interventions as a matter of principle, whether it be on the basis of humanitarian, ‘America-first’, or anti-imperialist reasons. We have seen this reluctance manifested in the refusal of America and other Western nations to commit significant forces to defend Syria against its homicidal dictator, to keep order in Libya following the overthrow of their homicidal dictator, and the willingness to abandon Afghanistan to the Taliban. As such, I am somewhat in the minority in my view that foreign military interventions can be and have often been highly effective, and should continue to be a major foreign policy instrument of Western states, especially the US. (Note that I am talking about public military interventions, not secret CIA plots or backdoor deals to elicit regime change). Ideally such interventions should be undertaken with a UN mandate, but for many reasons this is often not possible, and so I do not regard it as an essential pre-requisite.

Why do I say foreign military interventions can be highly effective? Primarily on the basis of historical experience, augmented by some theoretical considerations. The theory is fairly straightforward. The US and its allies hold the preponderance of military power on the planet, and as such they are often in the position to use that power to achieve particular ends. Military power is often necessary to restore order in a power vacuum, to prevent the escalation of violence in certain volatile circumstances, to defeat terrorist and other paramilitary groups, and to defend weaker nations against foreign aggressors. In many such cases military force on the ground is essential to bring about the desired goal of peace, freedom, and stability in the long term. We have seen many effective examples of foreign military interventions in the past 80 years or so. In addition to a large number of UN-backed peacekeeping missions, there is also the US invasions of France, Germany, Italy, and Japan during the Second World War. For some reason we don’t seem to think of these as invasions or aggression, but that’s exactly what they were. Of course they were fought for a good cause to defend the rights and freedoms of other peoples, but that doesn’t mean they weren’t foreign military interventions. To this I would add the Korean War, the invasion of Grenada, NATO intervention during the collapse of Yugoslavia, the First Gulf War, and the 2001 US invasion of Afghanistan. Some of these may be controversial, as are others that I do not list here. Nonetheless, I believe there are enough examples from history to show that foreign military interventions can be used to keep the peace, stop genocide, overthrow horrific regimes, and defend countries from aggression.

Arguments Against Intervention

Of course, just because foreign military interventions can be effective doesn’t mean that they always are. There are examples of failures too, not just successes. This leads me to consider the specific arguments given against Western military intervention in Ukraine.

First, it is argued that US intervention would trigger a nuclear war, which is too high a price to pay. I think this is a weak argument, as its unclear how or why this huge escalation is supposed to take place. Nuclear powers have fought before without triggering a nuclear war (India and Pakistan in varying capacities since 1999, and Russia and China in the 1960s). The USA and USSR also indirectly engaged with each other many times during the Cold War without triggering an exchange. There is no logical reason for either side to escalate the conflict to such a degree; in particular, Putin would know that even limited use of nuclear weapons would certainly by the end of his regime one way or the other.

Second, it is argued that ‘its not our war’, or something to this effect. This is the same argument given to defend the US withdrawal from Afghanistan last year. I don’t really understand this argument. How is it decided whose war is whose? If a country or people are being attacked (by a foreign power, their own government, or an insurgent force) and they ask for assistance which we are in a position to give, why exactly is that ‘nor our war’? If our neighbour’s house is on fire, do we stand by because its ‘not our fire’? Of course, we must exercise caution in determining whether intervening will help or just make things worse, but that is a different line of objection to just noting that foreign military interventions indeed involve foreign peoples and nations.

Third, it is argued that the US cannot fight against Russian forces because this would be the start of World War III. Though Biden himself has made this argument many times, I don’t understand the logic at all. It would seem that this argument would apply to any instance of military engagement between US and Russian forces. So if Russia invaded a NATO state like Lithuania or Poland, would the US argue that it can’t get involved because fighting Russia would be the start of World War III? If the answer is yes, then the US has totally given up on any policy of deterrence and is willing to just let Putin do whatever he wants. If the answer is no, then it shows the argument has nothing to do with the US fighting Russia per se, and is all about which alliance the nation in question is part of. This leads to the final argument.

Four, it is argued that Ukraine is not part of NATO, and so they have no treaty obligation to aid them. This is true legally speaking, but its unclear to me what relevance this has in determining whether they should aid the Ukraine. Just because they aren’t treaty-bound to do so doesn’t mean they can’t or shouldn’t assist. The US didn’t have any treaty obligations to defend South Korea or Kuwait during the wars there. Why exactly are they needed to defend the Ukraine against clear and blatant foreign aggression?

Arguments for Intervention

Having responded to some arguments against intervention, now let me offer some additional reasons for intervention aside from general appeals to protecting the rights and freedoms of the Ukrainian people. In particular, my argument here appeals to the vital importance of effective military deterrence of aggressive states. Deterrence requires that aggressive states rationally believe that if they push too far, they will encounter forceful opposition. The failure of deterrence in the name of avoiding war or foreign involvement typically only fosters war in the long-run, since aggressive regimes see the lack of resolve as an invitation to push further and further.

This is not a purely hypothetical argument, but also a description of what happened in the leadup to World War Two. Aggressive nations like Germany, Italy, and Japan repeatedly engaged in aggressive military actions with little or no effective response from the Western powers. This included invasions of Manchuria, Ethiopia, China, Albania, Austria, and Czechoslovakia during the period of 1933-1939. Britain and France imposed sanctions, denounced the aggression, and invoked the League of Nations, but did not actually do anything to prevent the aggression or defend any of the victims. Even when Poland was invaded, though Britain and France at last declared war, they took very little military action to actually assist Poland in fighting Germany. If Britain and France had acted stronger and sooner, the Second World War could probably have been prevented. It certainly would not have been as long or as bloody as it turned out to be, since the aggressors had a chance to grow in power and boldness and consolidate early gains.

Now fast forward to the 2010s. In the past 15 years we have witnessed Russian military intervention in Georgia, Syria, Crimea, the Donbass region, Kazakhstan, and now a major invasion of Ukraine. At each step, the West did very little to stop Putin. I’m not saying military intervention would have been appropriate in each case, but this history of increasing aggression should by now have increased Western resolve to draw a clear line and commit to some form of military intervention on behalf of Ukraine. This need not have taken the form of a full alliance. Other possibilities included enforcing a no-fly zone over Ukraine or specific conflict zones, a naval blockade of the Black Sea coast, deploying NATO troops as peacekeepers in certain crucial urban areas (including Kiev), or announcing in advance particular ‘red lines’ across which any Russian advance would result in a NATO response. All of these types of strategies have been employed before in different situations. I don’t know if any such possibilities were considered in the current crisis, but clearly the decision was taken to simply not intervene at all. Announcing this in advance as Biden did is especially bizarre in my view, as it totally killed any possible lingering deterrence the US might have had. I cannot fathom what advantage the US could gain from categorically announcing that it would not intervene in Ukraine under any conditions. Of course, as the crisis unfolded there was a desire to avoid escalation, but as I have argued, lack of effective deterrence is itself very likely to escalate the crisis, as indeed seems to have been the case here.

Reflecting on the Future

So what will be the likely course of events now the invasion has begun and the West isn’t going to do anything about it? I expect that the Russians will gradually occupy all of eastern Ukraine, then pursue the remaining Ukrainian military forces across the Dnieper into the west of country. Once they have taken the capital they will probably set up a puppet government, maybe even trotting out former president Viktor Yanukovych for legitimacy (they will argue he was illegally deposed). The southern and eastern parts of the country will be split into more ‘breakaway regions’, which will then have ‘referenda’ in which they ‘vote’ to by annexed by Russia. Of more immediate concern, as the Ukrainian military retreats westerwards, they will come up against the borders of Romania, Hungary, and Poland. If the Russians fail to cut them off beforehand, will these NATO states allow the Ukrainian forces in? Will the Russians pursue them? Will there be skirmishes at the borders that spill over into the neighbouring states? I don’t know what will happen here, but I expect this to be a very difficult aspect of the conflict. Wars, after all, tend to spread.

Longer-term, now that Putin knows he can invade a neutral country in a blatantly aggressive war and mostly get away with it, will it stop here? I doubt it. Putin has other objectives beyond Ukraine. At the minimum I expect him to take the opportunity to gobble up the pro-Russian breakaway region of Transnistria in eastern Moldova. He might even take that whole country while he’s at it. After that, perhaps he will set his sites on Finland, another region that until recently was a part of Russia. If Russia invaded Finland, also not a NATO member, would the West do anything? If so, why not for Ukraine? Then there is the issue of the Baltic states. These have been a thorn in Russia’s side for a while, providing NATO with bases so close to the city of St Petersburg. There is also the precedent this sets for others aggressors, in particular China with their long-held ambitions of the reintegration of Taiwan. I believe the CCCP will be watching the West’s response to Putin (or lack thereof) with great interest. Of course, perhaps this will mark the turning-point, and the West can look beyond its own internal squabbles to take a more active global role in establishing credible deterrence against aggression. On the other hand perhaps we will listen to those who ask, as they once did for Danzig: why die for Ukraine?

Five Major Problems with William Lane Craig’s Kalam Cosmological Argument


The Kalam Cosmological Argument is an argument for the existence of God that Craig developed during his doctoral work in the 1970s. While it can be formulated in various ways, here I will consider the following version:

P1. The universe began to exist.

P2. If the universe began to exist, it had a cause.

P3. If the universe had a cause, that cause must be a personal agent.

C. A personal agent caused the universe to begin to exist.

There is a simpler form of the Kalam which ends at premise two, and hence concludes merely that the universe had a cause. However since Craig uses the Kalam to demonstrate the existence of God, I believe it is important to make this aspect of the argument explicit by including premise three.

The Kalam has attracted a great deal of attention in recent years, both in popular debates and also in the philosophical literature. However, in my view the argument is fatally flawed. In this article I shall briefly outline five major problems with Craig’s defence of the Kalam, showing how he fails to establish the conclusion that the universe had a personal cause. Readers interested a more detailed discussion of these issues should consult my book, Unreasonable Faith.

The problem of the extent of the present

Craig has stated that when he talks of the universe ‘beginning to exist’, he means this in a very sense. Specifically, he means that the universe came into being from a state of nonbeing. As Craig has said many times, this depends on a particular understanding of the nature of time called ‘presentism’. According to this view, only the present exists; the past and the future do not exist. The past consists of a series of moments which came into being and then went out of being, while the future consists of a series of moments which will come into being, but haven’t yet done so. This view of time contrasts with a competing view known as the tenseless theory of time. This view of time holds that the past, present, and future all exist alongside each other, just at different temporal parts of the history of the universe. In this view, there is no objective ‘present’. Instead, the past is just the part of the universe that exists in the ‘before than’ direction, while the future is the part of the universe that exists in the ‘later than’ direction. Under the tenseless theory of time, a universe with a finite age would not have come into being from nonbeing, but would simply be extended a finite time in the ‘earlier than’ direction. This means that under a tenseless theory of time, the universe could not have ‘begun to exist’ in the way that Craig believes is necessary for the Kalam to succeed. Because of this, Craig has stated definitely that “The kalam cosmological argument presupposes from start to finish a theory, not of tenseless time, but of tensed time”.

According to Craig, presentism, the view that only the present exists, is vital for the Kalam to succeed. But this leads naturally to the question: what exactly counts as present? The problem is that Craig has no sensible answer to this question. One possibility is that the present could consist of an infinitesimally short instant, with an infinite collection of such instants forming a finite interval (just like the real numbers). But this won’t work, because it would imply that actual infinities could exist, which conflicts with Craig’s various philosophical arguments against the possibility of an infinitely old universe. A second possibility is that the present could consist of a discrete unit of time, with successive discrete presents following one after the other with no time in-between (just like the natural numbers). But this won’t work either, as it would mean that God was somehow restricted to act only at these discrete moments, but couldn’t act in between them. If neither of these possibilities is suitable, what is left? Craig’s answer is that the present consists of the present ‘non-metrical interval’, the length of which varies depending on the context. He describes this as follows: “the extent of the present depends upon the extent of the entity described as present… the duration stipulated to be present will be an arbitrary, finite duration centred on a conceptually specified instant”. Craig even says that this interval can be further divided into present, past, and future parts. Although creative, this solution is completely absurd; indeed it is essentially unintelligible. How can it be the case that we bring into being a period of time just by referring to it? It seems that according to Craig’s view, dinosaurs were brought back to life when humans discovered them and began referring to them! Also, if as Craig claims only the present exists, then how can there be parts of the present interval that are past and parts that are future, neither of which exist? How can something both exist and not exist? Craig must resolve this problem of the extent of the present if the Kalam has any chance of succeeding.

The problem of the relevance of thought experiments about infinity

Craig provides many philosophical arguments for why the universe cannot be infinitely old. These arguments generally consist of thought experiments which aim to provide illustrations of absurd or impossible scenarios that allegedly follow from the assumption that actual infinities can exist. Some of these thought experiments, such as Hilbert’s Hotel, aim to show that no truly infinite thing can exist in the real world. Hilbert’s Hotel is an imaginary hotel with infinitely many rooms. If such a hotel existed, then it would be possible for the hotel to start out with all its rooms occupied, then for the owner to shuffle the infinitely many guests around between rooms in such a manner as to free up an infinite number of rooms, thus allowing another infinite number of guests to check in. Craig argues that thought experiments such as this illustrate the absurdity of actual infinities, hence demonstrating that they cannot exist in reality.

Another type of thought experiment focuses specifically on infinities produced by processes of successive adding of one item to the next, since this is the type of infinite process which (so the argument goes) would be involved in giving rise to an infinitely old universe. While there are many such thought-experiments, one fairly simple example involves a man who has been counting down from infinity, and at the present moment finally finishes and reaches zero. According to Craig, such a scenario is impossible because the man should have finished yesterday, since yesterday he already had an infinite amount of time to count. However, he already had an infinite amount of time even before yesterday, and so he should have actually finished the day before yesterday – on so on backwards in time. Thus, according to Craig, we cannot make any sense of an infinite count arriving at the present at any particular time. From this Craig concludes that we should reject the possibility of arriving at the present after an infinitely long count.

While one can argue the details of each particular thought experiment, there is a more fundamental problem with such reasoning. Any specific example of an absurd or impossible situation can only ever show that the particular case in question is absurd or impossible. For example, Hilbert’s Hotel shows at most that infinitely large hotels with moveable guests are absurd. Likewise, the counting down from infinity example shows that it is not possible to begin with an infinitely large set and subtract one element at a time to eventually reach zero. The crucial question, however, is why we should infer from such cases that the universe cannot be infinitely old? The universe is not an infinite hotel, nor is there some cosmic counter marking down the time from infinity past to the present. Other thought experiments which readers may be familiar with, such as the Grim Reaper Paradox and the Tristam Shandy Paradox, similarly involve bizarre situations (such as infinitely many grim reapers appearing, or a man writing an infinitely long diary), neither of which are entailed by an infinitely old universe. As such, the most these thought experiments can ever show is that certain types of infinities are absurd or impossible, not that the universe must have begun to exist. Craig has argued that the relevant aspect of all such cases is the notion of infinity rather than any other details, but this can always be disputed. For instance, Hilbert’s Hotel assumes that infinitely many guests can be moved around, while past events cannot be moved around. The man counting down from infinitely had some starting point that he began counting from (since this is how counting works), and yet an infinitely old universe would have no such beginning point. There is thus always a further question about which part of the thought experiment renders it absurd or impossible, and Craig fails to show that this must be notion of infinity itself. This is a fundamental weakness of this type of argument: why should we infer that because certain situations involving actual infinities are impossible, that therefore the universe cannot be infinitely old?

The problem of the difference between physical and metaphysical time

To supplement his philosophical arguments, Craig also appeals to various results from contemporary cosmology to argue that the best empirical evidence supports the beginning of the universe. The result he most often appeals to is called the Borde-Vilenkin-Guth Theorem, which shows that under certain plausible assumptions, all past-directed paths through spacetime must terminate after a finite distance, and hence cannot be infinitely long. From this and other related results, Craig infers that there is strong empirical confirmation from cosmology that the universe began a finite time in the past. The problem with all such arguments is that the Borde-Vilenkin-Guth Theorem, along with all of modern cosmology, is founded upon the general theory of relativity. In general relativity, time is treated as a fourth dimension alongside the three dimensions of space, which can be bent by the existence of massive objects and is affected by relativistic effects such as time dilation. Time is treated in a tenseless manner, with all times existing alongside one another in a four-dimensional ‘spacetime’.

It is very difficult to see how this notion of time can be reconciled to Craig’s presentist philosophy of time, according to which the present is an objective feature of reality independent of the matter within the universe, and the past and future do not exist. Indeed, in his publications on the philosophy of time, Craig himself acknowledges this, arguing for example that “if we draw a distinction between metaphysical time and physical time as Newton did, it is quite evident that a beginning of the latter does not imply a beginning of the former”. He is also clear that the four-dimensional spacetime used in general relativity “serves as a convenient calculational and diagrammatical aid… but says absolutely nothing about ontology… the four-dimensional continuum should therefore be regarded as a useful tool, and not as a physical ‘reality’”. This means that Craig faces with a stark choice. If he holds firm to his presentist philosophy of time, then he must give up all the arguments (such as the Borde-Vilenkin-Guth Theorem) for the finite of the past that are based on physical cosmology. On the other hand, if he wishes to keep using these arguments, then he must give up presentism, the very philosophy of time he believes is essential for the Kalam to succeed.

The problem of establishing a cause of the universe

Craig believes that the universe must have had a cause of its coming into being. While this may seem reasonable on the face of it, on further examination it is actually very hard to see how Craig could possibly know such a thing. The most common argument Craig presents in favour of contention is simply that it is a basic metaphysical principle that “something cannot come out of nothing”. Yet this is begging the question. Someone who doubts that the universe had a cause will not simply accept that this is a basic metaphysical principle! Furthermore, Craig’s talk of ‘something coming from nothing’ is inaccurate, as this wording implies that somehow there was nothing, and then the universe ‘popped into being’, as Craig often says. Yet this clearly doesn’t make any sense. There wasn’t anything, no time or space, before the universe, so it wasn’t as if there was a black emptiness and then suddenly there was a loud ‘pop’, and the universe appeared. The idea is rather that if you could somehow go back to the first moment of time, the universe would already be there, and didn’t require anything to make it exist. It just exists all by itself, without an external cause. It appears that Craig has nothing to say about this more reasonable conception of a universe that began without a cause.

A second argument that Craig gives in support of the idea that the universe had a cause, is that we have overwhelming evidence both from science and everyday experience that when something begins to exist it has a cause. By analogy, he argues, it is therefore reasonable to believe that the universe as a whole must have had a cause. Such an inference, however, is simply not warranted. Everything we observe in science and everyday life is some object or process within the universe. The universe as a whole is not analogous to some particular thing within the universe, and so we have no reason to think that just because things within the universe have causes, that therefore the universe as a whole must also have a cause. This is similar to inferring that because every man has a mother, that therefore mankind as a whole also has a mother.

Craig’s third and final argument that the universe had a cause asks the question: if universes could come into existence without a cause, then why not anything?  Craig asks “why do bicycles and Beethoven and root beer not pop into being from nothing?” Yet the answer to such questions is obvious. Bicycles and Beethoven and root beer are things that exist within the universe, and have known necessary causes for their coming into being. By contrast, the universe is not something that exists inside the universe with known necessary causes, but instead is the totally of space and time itself, for which it is unknown whether there is any necessary cause. It is therefore entirely unsurprising that things do not pop into being within the universe. This tells us nothing about whether the universe itself required a cause, since obviously the universe would not have come into being inside the universe. Craig’s arguments notwithstanding, the question therefore remains: why couldn’t the universe have simply begun without a cause?

The problem of non-personal, timeless causes

Building upon the first two premises of the Kalam, Craig also argues that if the universe had a cause, that cause must be a personal agent. Craig argues that only an immaterial mind with libertarian free will could have the ability to bring about the beginning of a temporal series of events (i.e. create the universe) from an existing timeless state. To justify this, Craig argues first that the cause of the universe obviously can’t be an event in time, because then this would just be part of the universe, and would require its own cause. He also argues that the cause of the universe could not have been an impersonal timeless state, because any effect caused by such a state would have to be co-eternal with the cause. According to Craig, only an agent with libertarian free will could exist timelessly, and then by exercising their libertarian agency, bring about an effect that exists in time. Such an explanation leads naturally to the question – what is so special about libertarian agency? Why can’t some timeless, non-personal entity be the case of the universe? Why does it have to be a personal agent? Craig’s answer is simply that libertarian agents have a special type of causal power called ‘agent causation’, and only agent causation can bring about a temporal effect from a timeless cause. Why is that the case? Craig doesn’t appear to have any clear answer for this, other than his assertion that this is just the way it is. For instance, he claims “because the agent is free, he can initiate new effects by freely bringing about conditions which were not previously present. For example, a man sitting changelessly from eternity could freely will to stand up; thus, a temporal effect arises from an eternally existing agent.” Effectively this amounts to asserting that agent causation, and only agent causation, can bring about a temporal effect from a non-temporal initial state, because agent causation is just special that way. Needless to say, I do not find this to be a persuasive argument.

Indeed, I can see no reason why any sort of timeless cause could not function as the cause of the universe. Why could the cause not be some kind of timeless quantum field, such as is postulated by various models in quantum cosmology? Or possibly it could be an immaterial entity driven by instinct, such as the gunas from Hindu philosophy. Or maybe it could be a metaphysical force such as the Dao from Chinese philosophy. Or perhaps it could be abstract information with causal powers, as scholars like Paul Davies have recently argued. Or just as likely, it could be something else entirely that we haven’t even conceived of yet! Craig relies entirely on his personal intuition that a personal agent is the only sort of thing that could bring the universe into being, without even considering these other possibilities. Indeed, it seems on the face of it highly unlikely that we humans could have any reliable method of knowing what exists outside of the universe. Yet Craig not only claims to know this, but claims to have come to this conclusion on the basis of reflection and intuition, without even needing to do any scientific research or empirical investigation. As such, I believe Craig’s argument as to why the cause of the universe must be personal is entirely unpersuasive. He seems to have no persuasive answer to the simple question: why can’t a non-personal, timeless entity be the case of the universe?


Craig’s Kalam Cosmological Argument faces major problems. These can be summarised as a series of questions. First, what is the temporal extent of the present? Craig needs to provide a cogent answer to this question before he can develop a philosophy of time necessary for the Kalam to work. Second, how can we determine that because certain situations involving actual infinities are impossible, that therefore the universe cannot be infinitely old? This difficulty in making the leap between hypothetical thought experiments and the finitude of the past is a major limitation in philosophical arguments for a beginning of the universe. Third, is the notion of ‘time’ in general relativity equivalent to ‘real’ metaphysical time? It seems that Craig must answer ‘no’ to this question if he is to maintain his presentist philosophy of time, and yet such an answer means that he cannot appeal to scientific results (which depend on general relativity) to establish the beginning of the universe. Fourth, why can’t the universe just begin without a cause? Craig’s quip that ‘from nothing, nothing comes’ is not a sufficient answer, since the real question not whether the universe could have ‘popped into being’, but whether the universe could simply have begun a finite time in the past without any external cause. Fifth, why can’t a non-personal, timeless entity be the cause of the universe? Craig asserts that only an immaterial mind with libertarian freedom can cause a temporal state to begin from an initial timeless state, but has no non-circular justification for this, and simply ignores the many other possible causes that could bring about this effect. Unless Craig is able to provide cogent and consistent answers to these five questions, his Kalam Cosmological Argument will continue to be plagued by major unresolved problems, and fail to persuade skeptics such as me that God is the best explanation for the origin of the universe.

The Fermi Paradox has not been dissolved


This article is a response to the 2018 paper ‘Dissolving the Fermi Paradox’ by Anders Sandberg, Eric Drexler, and Toby Ord. The Fermi Paradox is the apparent contradiction between the great size and age of the universe, which seem to imply a high probability of extra-terrestrial life, and the fact that no extra-terrestrial life has been definitively observed. In their paper, Sandberg et al. use the Drake equation to estimate the expected number of alien civilizations in the Milky Way galaxy and in the observable universe. What differentiates their approach from previous efforts to answer this question is that they do not simply make a point estimate for the number of alien civilizations. Instead, they estimate the uncertainty for each parameter in the Drake equation, and then take random draws from the distribution of each parameter to estimate the distribution for the end result. Using this technique, they find that, although the mean expected number of alien civilizations may be high, there is a large tail of very low expected values, which means that overall the probability of Earth being the only planet to harbour civilization in the Milky Way, and indeed even in the observable universe, is fairly high. On this basis, the authors argue that since it is quite likely given our existing knowledge that no other intelligent alien civilizations exist, the Fermi Paradox is not paradoxical at all, and is therefore dissolved.

In this essay, I will argue that the analysis of Sandberg et al. is flawed in a number of key respects, and as a result the Fermi Paradox remains an open question. Here I briefly list the key problems with the Sandberg et al. paper, before proceeding to discuss each in more detail.

  1. The method used of multiplying uncertainties of many small numbers, most of which have an upper bound of one, is biased towards yielding a result of a high probability of Earth being unique, while also leading to various dubious results.
  2. The key result of the paper is driven largely by uncertainty in the parameter fl, which is modelled in an unusual way without clear justification.
  3. Adoption of slightly different (and I believe more plausible) modelling choices and parameter values yields totally different results, which do not result in the Fermi paradox being dissolved. I illustrate this by re-estimating the Sandberg et al. models using different parameters and modelling assumptions.

Multiplying small numbers

The very nature of the Drake equation, in which seven apparently-independent parameters (four of which are fractions) are multiplied together, means that it has the potential to give very low numbers. Indeed, since the rate of star formation is fairly well-established to be around 1-10, the average longevity of detectable civilizations is ultimately bounded by the age of the universe to around 1010 years, and the number of Earth-like planets per star is generally set at one, it follows that the largest possible number detectable civilizations in the Milky Way galaxy is 1011, or about the same as the number of stars in the galaxy. By contrast, there is no lower bound to any of these numbers, especially the four fractions, which can potentially be set to arbitrarily low non-zero values. The potential for abusing multiplication of fractions to give extremely low numbers is well known. A particularly egregious example of this can be found in the work of Christian apologist Tim McGrew, who estimates the prior probability of having the evidence we do pertaining to the resurrection of Jesus at less than 10-40, on the basis of multiplying together supposedly independent probabilities of each of Jesus’ disciples separately experiencing a hallucination. The Drake equation does not reach this level of absurdity, but nevertheless I think it is important to bear in mind that the structure of the equation makes it potentially much easier to get very low values than very high values. This is especially relevant because the methodology adopted by Sandberg et al. involves identifying sources of uncertainty (especially for the parameters fl and fi), and then using any such uncertainty to argue for wider parameter intervals. Wider parameter intervals, however, almost invariably result in higher probabilities for Earth being alone in the Milky Way and the universe. Interestingly, in arguing for the rare Earth hypothesis, Ward and Brownlee explicitly acknowledge this, noting that they “have accumulated a laundry list of potentially low-probability events or conditions necessary for animal life”. Contrary to the intention, such statements reduce my confidence in the method being applied, since it is all too easy to come up with more and more factors that can be multiplied to produce smaller and smaller numbers. This approach can also lead to bizarre consequences. For instance, under this method uncertainties which increase the number of possible avenues by which life may emerge (such as alternative encoding methods), increase our uncertainty about the value of the fl parameter. However, since this parameter is capped at 1, while there is no lower limit to the long tail of very low estimates for fl, in practise this primarily has the effect of reducing the estimated probability of life emerging spontaneously, even though it represents an additional pathway by which this could occur.

There are some further problems with the way the Drake equation is defined, and also the way it is applied in this study. First, it is not clear why it is assumed that life can only exist on planets. It seems just as probable that life could emerge on moons, either of rocky planets or of gas giants, as in the proposal that life may exist in the oceans of Europa. Given this consideration, is seems reasonable to consider the average number of habitable planets and moons per solar system. This would potentially increase the maximal bounds on parameter  by one or two orders of magnitude (see Table 1) based on the number of moons in our own solar system (there are ten moons with atmospheres).  Second, defining the parameter fl as the fraction of planets that develop life amounts to an implicit assumption that life can emerge at most once in the history of each planet. It seems to me, however, that such an assumption is unwarranted. Presumably life could evolve multiple times on the same planet, perhaps in succession following an intervening extinction event, or perhaps in different regions of the same planet that are inaccessible to each other (for example a planet with disconnected oceans). A similar possibility should also be considered with respect to the parameter fi, the fraction of planets with life that becomes intelligent, as there is no obvious reason why multiple intelligent species cannot evolve on the same planet. This might take the form of such species coexisting, or could involve one species evolving after the previous species became extinct, or even a second species evolving after the first had left the home planet to colonise other planets. Third, the Drake equation does not seem well-suited to incorporate either the seeding of life from one planet to another (which is called panspermia), or of speciation following stellar settlement, both of which would result in large numbers of contactable civilizations even if the emergence of intelligent life on a given planet is very unlikely. These latter scenarios are speculative to be sure, but ignoring these factors would lead to a downward-bias in the estimates of the Drake equation.

Table 1. Parameters used in the Sandberg paper compared to my preferred values.

SymbolMeaning of parameterSandberg et al.My preferredExplanation of my preferred
R*The rate of star formation per year1 to 1001 to 10Based on Robitaille and Whitney, 2010
fpThe fraction of stars with planets0.1 to 11Evidence of abundance of exoplanets, e.g. Cassan et al., 2012
neThe number of Earth-like (or otherwise habitable) planets per system with planets0.1 to 10.1 to 100Incorporating habitable moons
flThe fraction of habitable planets with life1 – exp(-λVt)
λVt ~ LogN(1,50)
0.01 to 1See below in text
fiThe fraction of planets with life that becomes intelligent0.001 to 10.0001 to 1Based on Sandberg et al., but slightly less optimistic.
fcThe fraction of intelligent civilizations that become detectable/contactable0.01 to 10.1 to 1Based on Sandberg et al., but slightly more optimistic.
LThe average longevity of such detectable civilizations in years102 to 1010103 to 1010Based on Sandberg et al., but slightly more optimistic.

The origin of life

Sandberg et al. model all of the parameters of the Drake equation using log-uniform distributions except for the fl parameter, for which they use a method by which they estimate the rate parameter λ per unit volume of a given planet undergoing an abiogenesis event per unit time. It is not explained why this method was used for this parameter while all the other parameters were based on values from the cited literature. Indeed, it seems that other parameters could potentially be modelled using similar methods. For example, the development of intelligent life could be modelled as having some probability per extant species per unit of surface area of a planet, or the longevity of a civilization could be modelled as a function of its population and resource availability. Furthermore, the justification for using a log-normal distribution with standard deviation of 50 is unclear. In supplement i, following a discussion of protein folding, the authors assert that “taking LU[λ] > 300 (log uncertainty of λ) as a reference value seems conservative. To claim confidence that LU is less than 300 would be bold, even allowing for a combinatorially large set of alternative AGT (abiotic/genetic transition) paths and outcomes”. However it is unclear how this number was arrived at, or on what basis 300 was chosen instead of any other number. In my view, abstract considerations of protein folding rates are not an appropriate way to estimate the probability of abiogenesis, as early forms of life would have likely relied upon currently unknown mechanisms of catalysis to significantly increase reaction rates.

As the authors note, the main result of the paper can be produced by relying entirely on the uncertainty of fl, even while setting all other parameters equal to their most optimistic values (see Table 3). However, the enormous uncertainty in fl is not only sufficient, but also necessary to drive the result. As I show in Table 3, if the fl parameter is modelled in the same way as fi and the other parameters, namely as a log-uniform with an interval determined by the cited literature (which I set to a range of 10-7 to 1, derived from the cited literature but ignoring a few outliers as discussed in Table 2), the probability of Earth being alone in the universe falls to almost zero. This result illustrates that the main effect reported by Sandberg et al., namely that the Fermi observation is not antecedently very improbable given our current knowledge of parameters in the Drake equation, is entirely dependent on their largely unexplained approach to describing the uncertainty in fl. If it is treated like the other parameters in their equation, the effect vanishes.

Another problem with the method adopted by Sandberg et al concerns some of the very low parameter estimates for fl and fi cited in supplement iii. As I show in Table 2 below, none of the very low estimates for these parameters are based on any scientific considerations. I did not investigate the higher estimates of these parameters, and so cannot claim that these are based on any better evidence than the very low values, but this lack of substance inclines me to be highly distrustful of estimates of fl and fi based on aggregations of expert opinion, since much opinion seems to be informed by little to no actual evidence or modelling. Although Sandberg et al. comment that “there is also a clear bias present towards optimistic, high values since SETI or ETI skeptics typically do not give estimates”, it is not obvious that this is true given the significant number of very low values reported here. Furthermore, four of the six parameters that I was able to check were not reported correctly, either because the stated number did not appear in the original source, or because the number had been misinterpreted to refer to fl and fi when it fact it was an estimate of something else. This further reduces my confidence in the method of estimating the parameters based on existing literature.

Table 2. Summary of the lowest reported literature values of fl and fi.

YearSource titleEstimateExplanationAccurate
1965Dimensions and probability of lifefl = 10-18This is a combined estimate of the probability of life and intelligent life. It is based on the probability of ‘man existing in its present form’, based on an ad hoc multiplication of the number of mutations needed to produce modern humans. This is not an estimate of the probability of life or intelligence in general.Yes
1980Extraterrestrial intelligent beings do not existfl = 10-10This estimate is actually for a combination of fl, fi, and fc, and is based entirely on the fact that probes from alien probes have not been observed. Hence this cannot be used to compute an estimate of the number of civilizations prior to the Fermi observation.No
2012How Likely is Extraterrestrial Life?fl = 10-17I can’t find this number in the cited text. On page 41, the value 10^-17 is given as the probability of a 120 nucleotide long polymer forming randomly in the lifetime of a single planet, but this is not asserted to be an actual estimate of fl.No
2015On the history and future of cosmic planet formationfi = 10-30This is not an actual estimate, but just an ‘arbitrarily small value’, based on a set of previous estimates, none of which mention actual numbers.Yes
1982Atmospheric evolution, the Drake equation, and DNA: sparse life in an infinite universefl = 10-30I have not been able to find this chapter online, and although I ordered the book it has not yet arrived, so I am not able to assess this estimate.NA
2009The search for extraterrestrials: intercepting alien signalsfl = 10-7This figure is not in the source given; the provided table gives a probability of 10%. Furthermore, the table is provided simply as an illustrative example, not an actual estimate.No
2015On The History and Future of Cosmic Planet Formationfi = 10-5The number 10^-5 is actually the fraction of habitable planets with communicable civilizations. This value comes from Carl Sagan’s 1963 paper, but Sagan is actually referring to the proportion of all stars that have a communicative civilization. Sagan’s actual estimate for fi is 0.1.No

An alternative modelling approach

While Sandberg et al. argue that observer selection effects prevent the use of the short time taken for life to emerge on Earth as evidence in favour of a high value for fl I believe this is mistaken. As Lineweaver and Davis argue, the observer effect that is imposed is that Earth must have existed for long enough to have allowed life to emerge. However, as far as we know it is equally possible that we could have found ourselves on a 9 billion year old Earth instead of a 4.5 billion year old one, had it been the case that life took a lot longer to develop than it actually did. Even with just this modest constraint, Lineweaver and Davis estimate that the 95% confidence interval for the probability of life emerging on planets similar to Earth during the same time period is at least 0.13. On the basis of this argument, as well as other considerations I mentioned previously concerning the possibility of radically different forms of life, I believe it is reasonable to model fl as log-uniform between 0.01 and 1. Incorporating this estimate along with my other preferred parameter values (see Table 2), I estimate at a 6% probability of Earth being alone in the Milky Way, and an effectively 0% chance of Earth being alone in the universe (see Table 3). These estimates were constructed by first attempting to replicate the original results, and then modifying the parameters and modelling assumptions in accordance with the various cases considered in Table 3. While I cannot defend my preferred parameter choices with great confidence, I do believe they are at the very least no less defensible than those used by Sandberg et al. As such my point is that, as has been the case since its invention, the Drake equation tells one more about the person making the estimates than about the probability of alien life. If one believes that alien life is likely to be rare, then low values can be chosen for the parameters, and a low estimate will be produced. Conversely, if one tends to think alien life is likely to be fairly common, large values can be chosen, and therefore a higher estimate produced.

Table 3. Summary of results for key models, with mean and median number of alien civilisations.

ModelMeanMedianP(N<1) in Milky WayP(N<1) in Universe
Main Sandberg model2.70E+070.3252%38%
My re-estimate of main Sandberg model2.73E+072.3247%26%
My re-estimate of main Sandberg with fl like fi3.15E+06150%0.004%
Sandberg model with uncertainty in fl onlyUnknownUnknown41%32%
My re-estimate of Sandberg model with fl only5.12E+119.17E+1128%15%
Naive model with literature log-uniform values5.07E+03096%51%
My preferred model3.18e+089.95E+036.16%0.000%
Figure 1. Probability density of main model estimated by Sandberg et al.
Figure 2. Probability density of my re-estimate of the main model of Sandberg et al.
Figure 3. Probability density of my preferred model.


In this essay I have argued that the analysis presented by Sandberg et al. concerning the dissolution of the Fermi paradox is multiply flawed. First, the approach of multiplying many parameter intervals with an upper bound at one, but no corresponding lower bound, predisposes the resulting distribution of the number of alien civilisations to exhibit a very long negative tail, which drives the reported result. Second, the method used to estimate the distribution of the parameter fl is insufficiently justified, and the main result of the paper disappears if this parameter is modelled in the same way as the other parameters. Furthermore, the very low estimates of fl and fi reported in the literature review as supporting evidence are mostly misstated from their sources, and are based on vague speculation rather than specific scientific evidence. Third, an alternative modelling approach which incorporates the evidence of the early emergence of life on Earth, and also makes slightly different assumptions regarding the other parameters of the Drake equation, finds a much lower probability that Earth is unique in the Milky Way, and a negligible chance of being alone in the universe. Overall, while Sandberg et al. have made a significant contribution to the study of the Fermi Paradox and the Drake equation by illustrating how computational methods can be used to construct distributions for the probability of various numbers of alien civilisations, I do not believe they have succeeded in showing that the probability of Earth being alone is sufficiently high so as to dissolve the Fermi paradox. Indeed, I present two different set of parameters which yield a negligible probability of Earth being alone in the universe, and hence insomuch as the estimated number of alien civilisations depends crucially on the chosen modelling assumptions, the Fermi Paradox remains very much an open question.

Note: readers interested in experimenting with their own choice of parameters or alternative modelling assumptions are invited to download the Jupyter Notebook at this link.


Behroozi, P., & Peeples, M. S. (2015). On the history and future of cosmic planet formation. Monthly Notices of the Royal Astronomical Society, 454(2), 1811-1817.

Blum, H. F. (1965). Dimensions and probability of life. Nature, 206(4980), 131-132.

Carter, B. (1983). The anthropic principle and its implications for biological evolution. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 310(1512), 347-363.

Halley, J. W. (2012). How Likely is Extraterrestrial Life? : Springer Science & Business Media.

Lineweaver, C. H., & Davis, T. M. (2002). Does the rapid appearance of life on Earth suggest that life is common in the universe? Astrobiology, 2(3), 293-304.

McGrew, T., & McGrew, L. (2009). The argument from miracles: A cumulative case for the resurrection of Jesus of Nazareth. The Blackwell companion to natural theology, 593-662.

Robitaille, T. P., & Whitney, B. A. (2010). The present-day star formation rate of the Milky Way determined from Spitzer-detected young stellar objects. The Astrophysical Journal Letters, 710(1), L11.

Ross, M. (2009). The search for extraterrestrials: intercepting alien signals: Springer Science & Business Media.

Sagan, C. (1963). Direct contact among galactic civilizations by relativistic interstellar spaceflight. Planetary and Space Science, 11(5), 485-498.

Sandberg, A., Drexler, E., & Ord, T. (2018). Dissolving the Fermi paradox. arXiv preprint arXiv:1806.02404.

Shermer, M. (2002). Why ET Hasn’t Called. SciAm, 287(2), 33-33.

Tipler, F. J. (1980). Extraterrestrial intelligent beings do not exist. Quarterly Journal of the Royal Astronomical Society, 21, 267-281.

Ward, P. D., & Brownlee, D. (2003). Rare Earth: Why complex life is uncommon in the universe: Springer Science & Business Media.

A Critical Review of ‘The Precipice’: A Reassessment of the Risks of AI and Pandemics


In this essay I will present a critical response to Toby Ord’s recent book The Precipice (page numbers refer to the soft cover version of this book). Rather than attempting to address all of the many issues discussed by Ord, I will focus on what I consider to be one of the most critical claims of the book. Namely, Ord claims that the present century is a time of unprecedented existential risk, that “we stand at a crucial moment in the history of our species” (p. 3), a situation which is “unsustainable” (p. 4). Such views are encapsulated in Ord’s estimate of the probability of an existential catastrophe over the next century, which he places at one in six. Of this roughly seventeen percent chance, he attributes roughly ten percentage points to the risks posed by unaligned artificial intelligence, and another three percentage points to the risks posed by engineered pandemics, with most of the rest of the risk is due to unforeseen and ‘other’ anthropogenic risks (p. 167). In this essay I will focus on the two major sources of risk identified by Ord, artificial intelligence and engineered pandemics. I will consider the analysis presented by Ord, and argue that by neglecting several critical considerations, Ord dramatically overestimates the magnitude of the risks from these two sources. This short essay is insufficient to provide a full justification for all of my views about these risks. Instead, my aim is to highlight some of what I believe to be the major flaws and omissions of Ord’s account, and also to outline some of the key considerations that I believe support a significantly lower assessment of the risks.

Why probability estimates matter

Before analysing the details of Ord’s claims about the risks of engineered pandemics and unaligned artificial intelligence, I will first explain why I think it is important to establish as accurate as possible estimates of the magnitude of these existential risks. After all, it could be argued that even if the risks are significantly less than those presented by Ord, nevertheless the risks are still far higher than we would like them to be, and causes such as unaligned AI and engineered pandemics are clearly neglected and require much more attention than they currently receive. As such, does it really matter what precise probabilities we assign to these risks? I believe it does matter, for a number of reasons.

First, Ord’s core thesis in his book is that humanity faces a ‘precipice’, a relatively short period of time with uniquely high and unsustainable levels of existential risk. To substantiate this claim, Ord needs to show not just that existential risks are high enough to warrant our attention, but that existential risk is much higher now than in the past, and that the risks are high enough to represent a ‘precipice’ at which humanity stands at the edge. Ord articulates this in the following passage:

If I’m even roughly right about their (the risks’) scale, then we cannot survive many centuries with risk like this. It is an unsustainable level of risk. Thus, one way or another, this period is unlikely to last more than a small number of centuries. Either humanity takes control of its destiny and reduces the risk to a sustainable level, or we destroy ourselves. (p. 31)

Critical here is Ord’s linkage of the scale of the risk with our inability to survive many centuries of this scale of risk. He goes on to argue that this is what leads to the notion of a precipice:

This comparatively brief period is a unique challenge in the history of our species… Historians of the future will name this time, and schoolchildren will study it. But I think we need a name now. I call it the Precipice. The Precipice gives our time immense meaning. (p. 31)

Given these passages, it is clear that there is a direct connection between the magnitude of the existential risks over the next century or so, and the existence of a ‘precipice’ that uniquely defines our time as historically special. This is a distinct argument from the weaker claim that existential risks are far higher than we should be comfortable with, and that more should be done to reduce them. My argument in this essay is that the main sources of the abnormally high risk identified by Ord, namely engineered pandemics and unaligned artificial intelligence, do not pose nearly as high a risk as Ord contends, and therefore his argument that the present period constitutes a ‘precipice’ is unpersuasive.

Second, I think precise estimates of the probabilities matter because there is a very long history of predicting the end of the world (or the end of civilisation, or other existential catastrophes), so the baseline for accuracy of such claims is poor. As such it seems reasonable to exercise some scepticism and caution when evaluating such claims, and ensure that they are based on sufficiently plausible evidence and reasoning to be taken seriously. This is also important for convincing others of such risks, as exaggeration of risks to humanity is very common, and is likely to reduce the credibility of those attempting to raise awareness of such risks. Ord makes a similar argument when he advises:

Don’t exaggerate the risks. There is a natural tendency to dismiss claims of existential risk as hyperbole. Exaggerating the risks plays into that, making it much harder for people to see that there is sober, careful analysis amidst the noise. (p. 213)

Third, I think that accurate estimates of probabilities of different forms of existential risk are important because it helps us to align our efforts and resources in proportion to the amount of risk posed by different causes. For example, if one type of risk is estimated to pose one hundred times as much risk as another, this implies a different distribution of efforts compared to if both causes posed roughly comparable amounts of risk. Ord makes this argument as follows:

This variation (in risk) makes it extremely important to prioritise our efforts on the right risks. And it also makes our estimate of the total risk very sensitive to the estimates of the top few risks (which are among the least well understood). So getting better understanding and estimates for those becomes a key priority. (p. 168)

As such, I believe it is important to carefully consider the probability of various proposed existential risk scenarios. In the subsequent two sections I will consider risks of engineered pandemics and unaligned artificial intelligence.

Engineered Pandemics

Extinction level agent exists

One initial consideration that must be addressed is how likely it is that any biological pathogen can even kill enough people to drive humanity to extinction. This places an upper limit on what any biotechnology could achieve, regardless of how advanced. Note that here I am referring to an agent such as a virus or bacterium that is clearly biological in nature, even if it is engineered to be more deadly than any naturally-occurring pathogen. I am not including entities that are non-biological in nature, such as artificial nanotechnology or other chemical agents. Whilst it is impossible to determine the ultimate limits of biology, one relevant point of comparison is the most deadly naturally-occurring infectious disease. To my knowledge, the highest fatality rate for any infectious biological agent that is readily transmissible between living humans is the Zaire ebolavirus, with a fatality rate of around 90%. It is unclear whether such a high fatality rate would be sustained outside of the social and climactic environment of West Africa whence the disease originated, but nevertheless we can consider this to be a plausible baseline for the most deadly known human infectious pathogen. Critically, it appears unlikely that the death of even 90% of the world population would result in the extinction of humanity. Death rates of up to 50% during the Black Death in Europe do not appear to have even come close to causing civilisational collapse in that region, while population losses of up to 90% in Mesoamerica over the course of the invasion and plagues of the 16th century did not lead to the end of civilization in those regions (though social and political disruption during these events were massive).

If we think the minimal viable human population is roughly 7,000 (which is near the upper end of the figures cited by Ord (p. 41), though rounded for simplicity), then a pathogen would need to directly or indirectly lead to the deaths of more than 99.9999% of the current world population in order to lead to human extinction. One could argue that the pathogen would only need to directly cause a much smaller number of deaths, with the remaining deaths caused by secondary disruptions such as war or famine. However to me this seems very unlikely, considering that such a devastating pathogen would significantly impair the ability of nations to wage war, and it is hard to see how warfare would affect all areas of the globe sufficiently to bring about such significant population loss. Global famine also seems unlikely, given that the greater the number of pandemic deaths, the more food stores would be available to survivors. Perhaps the most devastating scenario would be a massive global pandemic followed by a full-scale nuclear war, though it is unclear why should a nuclear exchange would follow a pandemic. One can of course devise various hypothetical scenarios, but overall it appears to me that a pathogen would have to have an extremely high fatality rate in order to have the potential to cause human extinction.

In addition to a high fatality rate, an extinction-level pathogen would also have to be sufficiently infectious such that it would be able to spread rapidly through human populations. It would need to have a long enough incubation time such that infected persons can travel and infect more people before they can be identified and quarantined. It would also need to be able to survive and propagate in a wide range of temperatures and climactic conditions. Finally, it would also need to be sufficiently dangerous to a wide range of ages and genetic populations, since any pockets of immunity would render extinction considerably less likely. Overall, it is highly unclear whether any biological agent with all these properties is even possible. In particular, pathogens which are sufficiently virulent to cause 99% or more fatality rates are likely to place such a burden on human physiology such that they would have a short incubation time, potentially rendering it easier to quarantine infected persons. Of course we do not know what is possible at the limits of biology, but given the extreme properties required of such an extinction-level pathogen, in my view it is very unlikely that such a pathogen is even possible.

Extinction level agent technologically feasible

Even if biological agents with the potential of wiping out humanity are theoretically possible, the question remains as to how long it will be until it becomes technologically feasible to engineer such an agent. While our current scientific understanding places significant limitations on what can be engineered, Ord argues that “it is not twentieth-century bioweaponry that should alarm us, but the next hundred years of improvements” (p. 133), which indicates that he believes that biotechnological advances over the next century are likely to enable the creation of a much wider range of dangerous biological agents. Of course, it is impossible to know how rapidly such technology will develop in the coming decades, however I believe that Ord overstates the current capabilities of such technology, and underestimates the challenges in developing pathogens of dramatically greater lethality than existing natural agents.

For example, Ord states that it is possible to “create entire functional viruses from their written code” (p. 128). I believe this claim is misleading, especially when read alongside Ord’s concern about ease of obtaining synthesised DNA, as it can potentially be read as asserting that viruses can be created using entirely synthetic means using only their DNA. This is false, as the methods cited by Ord describe techniques in which synthesised viral DNA is cultured cellular extracts, which as Ord also notes is not trivial and requires careful technique (p. 359). This approach still relies critically on utilising the ribosomes and other cellular machinery to translate viral DNA and produce the needed viral proteins. It does not involve the degree of control or understanding of the precise molecular processes involved that would be implied if an intact virus could be produced from its DNA using entirely synthetic means.

Ord also cites the 2012 experiments of Ron Fouchier, who conducted a gain-of-function experiment with H5N1 influenza in ferrets. Ord states that “by the time it passed to the final ferret, his strain of H5N1 had become directly transmissible between mammals” (p. 129). While technically correct, I believe this claim is misleading, since only a few sentences prior Ord states that this strain of influenza had an estimated 60% mortality rate in humans, implying that this would also apply to an airborne variant of the same virus. However in Fouchier’s study, it is reported that “although the six ferrets that became infected via respiratory droplets or aerosol also displayed lethargy, loss of appetite, and ruffled fur, none of these animals died within the course of the experiment.” Furthermore, the mere possibility of airborne transmission says nothing about the efficiency of this transmission mechanism. As reported in the paper:

Although our experiments showed that A/H5N1 virus can acquire a capacity for airborne transmission, the efficiency of this mode remains unclear. Previous data have indicated that the 2009 pandemic A/H1N1 virus transmits efficiently among ferrets and that naïve animals shed high amounts of virus as early as 1 or 2 days after exposure. When we compare the A/H5N1 transmission data with that of [another paper]…, the data shown in Figs. 5 and 6 suggest that A/H5N1 airborne transmission was less robust, with less and delayed virus shedding compared with pandemic A/H1N1 virus.

These qualifications illustrate the fundamental point that most biological systems exist as a set of tradeoffs and balances between competing effects and conflicting needs. Thus changing one aspect of a pathogen, such as its mode of transmission, is likely to have effects on other aspects of the pathogen, such as its lethality, incubation period, susceptibility to immune system attack, or survival outside a host. In theory it may be possible to design a pathogen with properties optimised to be as lethal to humans as possible, but doing so would require far greater understanding of protein folding pathways, protein-protein interactions, gene expression, mechanisms of pathogen invasion, immune system evasion strategies, and other such factors than is currently possessed. Thus it is by no means clear that Ord is correct when he states that “this progress in biotechnology seems unlikely to fizzle out soon: there are no insurmountable challenges looming; no fundamental laws blocking further developments” (p. 128). Indeed, I believe there are many fundamental challengers and gaps in our understanding which prevent the development of pathogens with arbitrarily specified properties.

Extinction level agent produced and delivered

Even if was technologically possible to produce a pathogen capable of causing human extinction, the research, production, and distribution of such an infectious agent would still actually need to be carried out by an organisation with the capabilities and desire to do so. While Ord’s example of the Aum Shinrikyo cult does demonstrate that such groups exist, the very small number of such attacks historically appears to indicate that such groups do not exist in large numbers. Very few ideologies have an interest in bringing humanity to an end through violent means. Indeed as Ord notes:

For all our flirtation with biowarfare, there appear to have been relatively few deaths from either accidents or use… Exactly why this is so is unclear. One reason may be that bioweapons are unreliable and prone to backfiring, leading states to use other weapons in preference. (p. 132)

Ord partially counters this observation by arguing that the severity of events such as terrorist attacks and incidents of biowarfare follow a power law distribution, with very rare, very high impact events meaning that the average size of past events will underestimate the expected size of future events. However this response does not seem to address the core observation that bioweapons have proven very hard to control, and that very few agents or organisations have any interest in unleashing a pathogen that kills humans indiscriminately. This appears to be reflected in the fact that as far as is publicly known, very few attempts have even been made to deploy such weapons in modern times. I thus believe that we have good reason to think that the number of people and amount of effort devoted to developing such dangerous bioweapons is likely to be low, especially for non-state actors.

Furthermore, Ord fails to consider the practical difficulties of developing and releasing a pathogen sufficiently deadly to cause human extinction. In particular, developing a novel organism would require lengthy research and extensive testing. Even if all the requisite supplies, technology, and expertise over a period of time could be obtained without arousing enough suspicion for the project to be investigated and shut down, there still remains the challenge of how such a pathogen could be tested. No animal model is perfect, and so any novel pathogen would (just like vaccines and other medical treatments) need to be tested on large numbers of human subjects, and likely adjusted in response to results. It would need to be trialled in different environments and climates to determine whether it would spread sufficiently rapidly and survive outside a host long enough. Without such tests, it is virtually impossible that an untested novel pathogen would be sufficiently optimised to kill enough people across a wide enough range of environments to cause human extinction. However, it is hard to see how it would be possible to carry out such widespread testing with a diverse enough range of subjects without drawing the attention of authorities.

A rogue state such as North Korea might be able to circumvent this particular problem, however that raises as range of new difficulties, such as why it would ever be in the interest of a state actor (as opposed to a death cult terrorist group) to develop such a deadly, indiscriminate pathogen. Ord raises the possibility of its use as a deterrent (akin to the deterrence function of nuclear weapons), but the analogy does not appear to hold up. Nuclear weapons work as a deterrent because their possession can be publicly demonstrated (by testing), their devastating impact is widely known, and there is no practical defence against them. None of these properties are true of an extremely lethal novel pathogen. A rogue state would have great difficulty proving that they possessed such a weapon without actually making available enough information about the pathogen, such that the world would likely be able to develop countermeasures to that particular pathogen. As such, it does not appear feasible to use bioweapons as effective deterrents, which may partly explain why despite extensive research into the possibility, no states have yet used them in this manner. As a result of these considerations, I conclude that even if it were technologically possible to develop a pathogen sufficiently lethal to cause human extinction, it is unlikely that anyone would actually have both the desire and the ability to successfully produce and deliver the pathogen.

Failure of timely public policy response

The release of a pathogen that has the potential to cause human extinction in itself does not imply that human extinction would inevitably occur. Whether this would follow depends on the extent of the governmental and societal responses to the outbreak of the novel pandemic, such as quarantines, widespread testing, and contact tracing. In considering the balance of positive and negative effects that organisational and civilization advances have had on the ability to respond to the risk of pathogens, Ord states that “it is hard to know whether these combined effects have increased or decreased the existential risk from pandemics” (p. 127). This argument, however, seems implausible, since deaths from infectious diseases and pandemics in particular have decreased in recent centuries, with no major pandemics in Western Europe since the early eighteenth century. The disappearance of plague from Western Europe, while still not well understood, plausibly may have been caused at least in part by the improvement of quarantine and public policy responses to plague. In the US, the crude death rate from infectious diseases fell by about 90% over the course of the twentieth century. Furthermore, a successful public policy response to a pathogen outbreak in even a single country would likely be enough to prevent extinction, even if most countries failed to enact a sufficient public policy response. As such, I believe it is unlikely that even an extinction-level novel pathogen would be able to sufficiently evade all public health responses so as to cause human extinction.

Failure of timely biomedical response

In addition to the failure of public policy responses, extinction of humanity by a novel pathogen would also require the failure of any biomedical response to the pandemic. Ord believes that as biological techniques become easier and cheaper, they become accessible to more and more people, and hence represent a greater and greater risk. He argues:

As the pool of people with access to a technique grows, so does the chance it contains someone with malign intent. (p. 134)

This argument, however, appears to only consider one side of the issue. As the pool of people with access to a technique grows, so too does the number of people who wish to use that technique to do good. This includes developing techniques and technologies for more easily detecting, controlling, and curing infectious diseases. It surprises me that Ord never mentions this, since the development of biomedical technologies does not only mean that there is greater scope for use of the technology to cause disease, but also greater scope for use new techniques to prevent and cure disease. Indeed, since the prevention of disease receives far more research attention that causing disease, it seems reasonable to assume that our abilities to development treatments, tests, and vaccines for diseases will develop more rapidly than our abilities to cause disease. There are a range of emerging biomedical technologies that promise to greater improve our ability to fight existing and novel diseases, including transmissible vaccines, rational design of drugs, and reverse vaccinology. As such, I regard it unlikely that if biomedical technology had advanced sufficiently to be able to produce an extinction-level pathogen, it would nevertheless fail to develop sufficient countermeasures to the pathogen to at least prevent full human extinction.

Unaligned Artificial Intelligence

AI experts and AI timelines

Although Ord appeals to surveys of AI researchers as evidence of the plausibility of the development of superhuman artificial intelligence in the next century, experts in artificial intelligence do not have a good track record of predicting future progress in AI. Massively inflated expectations of the capabilities of symbolic AI systems in the 1950s and 1960s, and of expert systems in the 1980s, are well-known examples of this. More generally, it is unclear why we should even expect AI researchers to have any particular knowledge about the future trajectories of AI capabilities. Such researchers study and develop particular statistical and computational techniques to solve specific types of problems. I am not aware of any focus of their training on extrapolating technological trends, or in investigations historical case studies of technological change. Indeed, it would seem that cognitive psychologists or cognitive neuroscientists might be better placed (although probably still not very well placed) to make judgements about the boundaries of human capability and what would be required for these to be exceeded in a wide range of tasks, since AI researchers have no particular expertise in the limits of human ability. AI researchers generally only consider human-level performance in the context of baseline levels of performance on well-defined tasks such as image recognition, categorisation, or game-playing. This is far removed from being able to make judgements about when AIs would be able to outperform humans on ‘every task’. For example, do AI researchers really have any expertise on when AIs are likely to overtake human ability to do philosophy, serve as political leaders, compose a novel, or teach high school mathematics? These are simply not questions that are studied by AI researchers, and therefore I don’t see any reason why they should be regarded as having special knowledge about them. These concerns are further emphasised by the inconsistency of researcher responses to AI timeline surveys:

Asked when an AI system would be ‘able to accomplish every task better and more cheaply than human workers, on average they estimated a 50 percent change of this happening by 2061. (p. 141)

However in a footnote Ord notes:

Note also that this estimate may be quite unstable. A subset of the participants were asked a slightly different question instead (emphasising the employment consequences by talking of all occupations instead of all tasks). Their time by which there would be a 50% chance of this standard being met was 2138, with a 10% chance of it happening as early as 2036. (p. 362)

Another factor highly pertinent to establishing the relevant set of experts concerns how the current topics researched by AI researchers relate to the eventual set of methods and techniques eventually used in building an AGI. Ord seems to think that developments of current methods may be sufficient to develop AGI:

One of the leading paradigms for how we might eventually create AGI combines deep learning with an earlier idea called reinforcement learning. (p. 143)

However such current methods, in particular deep learning, are known to be subject to a wide range of limitations. Major concerns include the ease with which adversarial examples can be used to ‘fool’ networks into misclassifying basic stimuli, the lack of established methods for integrating syntactically-structured information with neural networks, the fact that deep learning is task-specific and does not generalise well, the inability of deep learning systems to develop human-like ‘understanding’ that permits robust inferences about the world, and the requirement for very large datasets for deep learning algorithms to be trained on. While it remains possible that all these limitations may be overcome in the future, at present they represent deep theoretical limitations of current methods, and as such I see little reason to expect they can be overcome without the development of substantially new and innovative concepts and techniques. If this is correct, then there seems little reason to expect that AI researchers to have any expertise in predicting when such developments are likely to take place. AI researchers study current techniques, but if (as I have argued) such techniques are fundamentally inadequate for the development of true AGI, then such expertise is of limited relevance in assessing plausible AI timelines.

One argument that Ord gives in apparent support of the notion that current methods may in principle be sufficient for the development of AGI relates to the success of using deep neural networks and reinforcement learning to train artificial agents to play Atari games:

The Atari-playing systems learn and master these games directly from the score and the raw pixels on the screen. They are a proof of concept for artificial general agents: learning to control the world from raw visual input; achieving their goals across a diverse range of environments. (p. 141)

I believe this is a gross overstatement. While these developments are impressive, they in no way provide a proof of concept for ‘artificial general agents’, anymore than programs developed in the 1950s and 1960s to solve grammatical or geometric problems in simple environments provided such a proof of concept. Atari games are highly simplified environments with comparatively few degrees of freedom, the number of possible actions is highly limited, and where a clear measure of success (score) is available. Real-world environments are extremely complicated, with a vast number of possible actions, and often no clear measure of success. Uncertainty also plays little direct role in Atari games, since a complete picture of the current gamespace is available to the agent. In the real world, all information gained from the environment is subject to error, and must be carefully integrated to provide an approximate model of the environment. Given these considerations, I believe that Ord overstates how close we currently are to achieving superhuman artificial intelligence, and understates the difficulties that scaling up current techniques would face in attempting to achieve this goal.

AI has the power to usurp humanity

Ord argues that artificial intelligence that was more intelligent than humans would be able to usurp humanity’s position as the most powerful species on Earth:

What would happen if sometime this century researchers created an artificial general intelligence surpassing human abilities in almost every domain? In this act of creation, we would cede our status as the most intelligent entities on Earth. So without a very good plan to keep control, we should also expect to cede our status as the most powerful species, and the one that controls its own destiny. (p. 143)

The assumption behind this claim appears to be that intelligence alone is the critical determining factor behind which species or entity maintains control over Earth’s resources and future. This premise, however, conflicts with what Ord says earlier in the book:

What set us (humanity) apart was not physical, but mental – our intelligence, creativity, and language…each human’s ability to cooperate with the dozens of other people in their band was unique among large animals. (p. 12)

Here Ord identifies not only intelligence, but also creativity and ability to cooperate with others as critical to the success of humanity. This seems consistent with the fact that human intelligence, as far as can be determined, has not fundamentally changed over the past 10,000 years, even while our power and capabilities have dramatically increased. Obviously, what has changed is our ability to cooperate at much larger scales, and also our ability to build upon the achievements of previous generations to gradually increase our knowledge, and build up more effective institutions and practices. Given these considerations, it seems far from obvious to me that there mere existence of an agent more intelligent than an individual human would have the ability to usurp humanity’s position. Indeed, Ord’s own examples seem to further emphasise this point:

History already involves examples of individuals with human-level intelligence (Hitler, Stalin, Genghis Khan) scaling up from the power of an individual to a substantial fraction of all global power. (p. 147)

Whilst we have no clear data on the intelligence of these three individuals, what does seem clear is that none of them achieved the positions they did by acts of profound intellect. They were capable men, with Stalin in particular being very widely read, and Hitler known to have a sharp memory for technical details, nevertheless they were far from being the greatest minds of their generation. Nor did they achieve their positions by ‘scaling up’ from an individual to world superpower. I think it is more accurate to say that they used their individual talents (military leadership for Genghis Khan, administrative ability and political scheming for Stalin, and oratory and political scheming for Hitler) to gain control over existing power structures (respectively Mongol tribes, the Soviet government, and the German government). They did not build these things from scratch themselves (though Genghis Khan did establish a unified Mongol state, so comes closer than the others), but were able to hijack existing systems and convince enough people to follow their leadership. These skills may be regarded as a subset of a very broad notion of intelligence, but do not seem to correspond very closely at all to the way we normally use the word ‘intelligence’, nor do they seem likely to be the sorts of things AIs would be very good at doing.

Lacking a physical body to interact with people, it is hard to see how an AI could inspire the same levels of loyalty and fear that these three leaders (and many others like then) relied upon in their subordinates and followers. Of course, AIs could manipulate humans to do this job for them, but this would raise an immense difficulty of ensuring that their human pawns do not usurp their authority, which would be very difficult if all the humans that the AI is attempting to control do not actually have any personal loyalty for the AI itself. Perhaps the AI could pit multiple humans against one another and retain control over them in this manner (indeed that is effectively what Hitler did with his subordinates), however doing so generally requires some degree of trust and loyalty on behalf of one’s subordinates to be sustainable. Such methods are also very difficult to manage (such as the need to prevent plots by subordinates against the leader), and place clear limits on how effectively the central ruler can personally control everything. Of course one could always say ‘if an AI is intelligent enough it can solve these problems’, but my argument is precisely that it is not at all clear to me that ‘intelligence’ is even the key factor determining success. A certain level of intelligence is needed, but various forms of subtle interpersonal skills distinct from intelligence seem far more important in acquiring and maintaining their positions, skills which a non-embodied AI would face particular difficulty in acquiring.

Overall, I am not convinced that the mere existence of a highly-intelligent AI would imply anything about the ability of that AI to acquire significant power over humanity. Gaining power requires much more than individual intelligence, but also the ability to coordinate large numbers of people, to exercise creativity, to inspire loyalty, to build upon past achievements, and many others. I am not saying that an AI could not do these things, only that they would not automatically be able to do these things by being very intelligent, nor would these things necessarily be able to be done very quickly.

AI has reason to usurp humanity

Although Ord’s general case for concern about AI does not appeal to any specific vision for what AI might look like, an analysis of the claims that he makes indicates that his arguments are mostly relevant to a specific type of agent based on reinforcement learning. He says:

One of the leading paradigms for how we might eventually create AGI combines deep learning with an earlier idea called reinforcement learning…  unfortunately, neither of these methods can be easily scaled up to encode human values in the agent’s reward function. (p. 144)

While Ord presents this as merely a ‘leading paradigm’, subsequent discussion appears to assume that an AI would likely embody this paradigm. For example he remarks:

An intelligent agent would also resist attempts to change its reward function to something more aligned with human values. (p. 145)

Similarly he argues:

The real issue is that AI researchers don’t yet know how to make a system which, upon noticing this misalignment, updates its ultimate values to align with ours rather than updating its instrumental goals to overcome us. (p. 146)

While this seems plausible in the case of a reinforcement learning agent, it seems far less clear that it would apply to another form of AI. In particular, it is not even clear if humans actually posses anything that corresponds to a ‘reward function’, nor is it clear that such a thing is immutable with experience or over the lifespan. To assume that an AI would have such a thing therefore is to make specific assumptions about the form such an AI would take. This is also apparent when Ord argues:

It (the AI) would seek to acquire additional resource, computational, physical or human, as these would let it better shape the world to receive higher reward. (p. 145)

Again, this remark seems explicitly to assume that the AI is maximising some kind of reward function. Humans often act not as maximisers but as satisficers, choosing an outcome that is good enough rather than searching for the best possible outcome. Often humans also act on the basis of habit or following simple rules of thumb, and are often risk averse. As such, I believe that to assume that an AI agent would be necessarily maximising its reward is to make fairly strong assumptions about the nature of the AI in question. Absent these assumptions, it is not obvious why an AI would necessarily have any particular reason to usurp humanity.

Related to this question about the nature of AI motivations, I was surprised that (as far as I could find) Ord says nothing about the possible development of artificial intelligence through the avenue of whole brain emulation. Although currently infeasible, simulation of the neural activity of an entire human brain is potential route to AI which requires only very minimal theoretical assumptions, and no major conceptual breakthroughs. A low-level computer simulation of the brain would only require sufficient scanning resolution to measure neural connectivity and parameters of neuron physiology, and sufficient computing power to run the simulation in reasonable time. Plausible estimates have been made which indicate that extrapolating from current trends, such technologies are likely to be developed by the second half of this century. Although it is by no means certain, I believe it is likely that whole brain emulation will be achievable before it is possible to build a general artificial intelligence using techniques that do not attempt to directly emulate the biology of the brain. This potentially results in a significantly different analysis of the potential risks than that presented by Ord. In particular, while misaligned values still represent a problem for emulated intelligences, we do at least possess an in-principle method for aligning their values, namely the same sort of socialisation that is used with general success in aligning the values of the next generation of humans. As a result of such considerations, I am not convinced that it is especially likely that an artificial intelligence would have any particular reason or motivation to usurp humanity over the next century.

AI retains permanent control over humanity

Ord seems to assume that once an AI attained a position of power over the destiny of humanity, it would inevitably maintain this position indefinitely. For instance he states:

Such an outcome needn’t involve the extinction of humanity. But it could easily be an existential catastrophe nonetheless. Humanity would have permanently ceded its control over the future. Our future would be at the mercy of how a small number of people set up the computer system that took over. If we are lucky, this could leave us with a good or decent outcome, or we could just as easily have a deeply flawed or dystopian future locked in forever. (p. 148)

In this passage Ord speaks of the AI as it if is simply a passive tool, something that is created and forever after follows its original programming. Whilst I do not say this is impossible, I believe that it is an unsatisfactory way to describe an entity that is supposedly a superintelligent agent, something capable of making decisions and taking actions on the basis of its own volition. Here I do not mean to imply anything about the nature of free will, only that we do not regard the behaviour of humans as simply the product of what evolution has ‘programmed into us’. While it must be granted that evolutionary forces are powerful in shaping human motivations and actions, nevertheless the range of possible sets of values, social arrangements, personality types, life goals, beliefs, and habits that is consistent with such evolutionary forces is extremely broad. Indeed, this is presupposed by Ord’s claim that “humanity is currently in control of its own fate. We can choose our future.” (p. 142).

If humanity’s fate is in our own hands and not predetermined by evolution, why should we not also say that the fate of a humanity dominated by an AI would be in the hands of that AI (or collective of AIs that share control), rather than in the hands of the designers who built that AI? The reason I think this is important is that it highlights the fact that an AI-dominated future is by no means one in which the AI’s goals, beliefs, motivations, values, or focus is static and unchanging. To assume otherwise is to assume that the AI in question takes a very specific form which, as I have argued above, I regard as being unlikely. This significantly reduces the likelihood that a current negative outcome with AI represents a permanent negative outcome. Of course, this is irrelevant if the AI has driven humans to extinction, but it becomes highly relevant in other situations in which an AI has placed humans in an undesirable, subservient position. I am not convinced that such a situation would be perpetuated indefinitely.

Probability Estimates

Taking into consideration the analysis I have presented above, I would like to close by presenting some estimates of my best guess of the probability of an existential catastrophe occurring within the next century by an engineered pandemic and unaligned artificial intelligence. These estimates should not be taken very seriously. I do not believe we have enough information to make sensible quantitative estimates about these eventualities. Nevertheless, I present my estimates largely in order to illustrate the extent of my disagreement with Ord’s estimates, and to illustrate the key considerations I examine in order to arrive at an estimate.

Probability of engineered pandemics

Considering the issue of how an engineered pandemic could lead to the extinction of humanity, I identify five separate things that must occur, which to a first approximation I will regard as being conditionally independent of one another:

  1. There must exist a biological pathogen with the right balance of properties to have the potential of leading to human extinction.
  2. It must become technologically feasible within the next century to evolve or engineer this pathogen.
  3. The extinction-level agent must be actually produced and delivered by some person or organisation.
  4. The public policy response to the emerging pandemic must fail in all major world nations.
  5. Any biomedical response to the pandemic, such as developing tests, treatments, or vaccines, must fail to be developed within sufficient time to prevent extinction.

On the basis of the reasoning presented in the previous sections, I regard 1) as very unlikely, 2), 4), and 5) as unlikely, and 3) as slightly less unlikely. I will operationalise ‘very unlikely’ as corresponding to a probability of 1%, ‘unlikely’ as corresponding to 10%, and the ‘slightly less likely’ as 20%. Note each of these probabilities is taken as conditional on all the previous elements; so for example my claim is that conditional on an extinction-level pathogen being possible, there is a 10% chance that it will be technologically feasible to produce this pathogen within the next century. Combining all these elements results in the following probability:

P(bio extinction) = P(extinction level agent exists) x P(extinction level agent technologically feasible) x P(extinction level agent produced and delivered) x P(failure of timely public policy response) x P(failure of timely biomedical response)

P(bio extinction) = 0.01×0.1×0.2×0.1×0.1 = 2×10^(-6)

In comparison, Ord’s estimated risk from engineered pandemics is 1/30, or 3×10^-2. Ord’s estimated risk is thus roughly 10,000 times larger than mine.

Probability of unaligned artificial intelligence

Considering the issue of unaligned artificial intelligence, I identify four key stages that would need to happen for this to occur, which again I will regard to first approximation as being conditionally independent of one another:

  1. Artificial general intelligence, or an AI which is able to out-perform humans in essentially all human activities, is developed within the next century.
  2. This artificial intelligence acquires the power to usurp humanity and achieve a position of dominance on Earth.
  3. This artificial intelligence has a reason/motivation/purpose to usurp humanity and achieve a position of dominance on Earth.
  4. This artificial intelligence either brings about the extinction of humanity, or otherwise retains permanent dominance over humanity in a manner so as to significantly diminish our long-term potential.

On the basis of the reasoning presented in the previous sections, I regard 1) as roughly as likely as not, and 2), 3), and 4) as being unlikely. Combining all these elements results in the following probability:

P(AI x-risk) = P(AI of sufficient capability is developed) x P(AI gains power to usurp humanity) x P(AI has sufficient reason to usurp humanity) x P(AI retains permanent usurpation of humanity)

P(AI x.risk) = 0.5×0.1×0.1×0.1=5×10^(-4)

In comparison, Ord’s estimated risk from unaligned AI is 1/10, or 10^-1. Ord’s estimated risk is roughly 200 times larger than mine.

Arriving at credible estimates

Although I do not think the specific numbers I present should be taken very seriously, I would like to defend the process I have gone through in estimating these risks. Specifically, I have identified the key processes I believe would need to occur in order for extinction or other existential catastrophe to occur, and then assessed how likely each of these processes would be to occur on the basis of the relevant historical, scientific, social, and other considerations that I believe to be relevant. I then combine these probabilities to produce an overall estimate.

Though far from perfect, I believe this process if far more transparent than the estimates provided by Ord, for which no explanation is offered as to how they were derived. This means that it is effectively impossible to subject them to critical scrutiny. Indeed, Ord even states that his probabilities “aren’t simply an encapsulation of the information and argumentation in the chapters on the risks” (p. 167), which seems to imply that it is not even possible to subject them to critical analysis on the basis of the information present in this book. While he defends this on the basis that what he knows about the risks “goes beyond what can be distilled into a few pages” (p. 167), I do not find this a very satisfactory response given the total lack of explanation of these numbers in a book of over 400 pages.


In this essay I have argued that in his book The Precipice, Toby Ord has failed to provide a compelling argument that humanity faces a ‘precipice’ with unprecedentedly high and clearly unsustainable levels of existential risk. My main objective was to present an alternative analysis of the risks associated with engineered pandemics and unaligned artificial intelligence, highlighting issues and considerations that I believe Ord does not grant sufficient attention. Furthermore, on the basis of this analysis I presented an alternative set of probability estimates for these two risks, both of which are considerably lower than those presented by Ord. While far from comprehensive or free from debatable premises, I hope that the approach I have outlined here provides a different perspective on the debate, and helps in the development of a nuanced understanding of these important issues.

Lessons from the Fall of Rome


In this article I discuss the key historical factors contributing to the fall of the Western Roman Empire, and what lessons we can learn from this pivotal historical event.


The fall of the Western Roman Empire in the fifth century is one of the pivotal events of world history. For over seven centuries the Romans had dominated the Mediterranean, spreading Graeco-Roman civilization throughout much of Western and Southern Europe, North Africa, and the Near East. Much has been written in the succeeding centuries attempting to account for the collapse of such a powerful and long-lived empire. In this essay I want to consider this question from the perspective of what lessons might be pertinent from this period in informing our responses to political and social problems in the present. While I believe the past is a valuable source of wisdom for the present, care must be taken in what lessons we learn, and in particular it is critical we avoid the temptation to simply read current problems directly onto the past. Unfortunately many commentators have failed to exercise such caution, resulting in many alleged ‘lessons’ from the fall of Rome that relate more to the concerns and anxieties of the author than the social, political, and economic forces operating in the fifth century. My goal is therefore not to find simple one-to-one correspondences between social, economic, and political processes in the fifth century and similar processes in the twenty-first century. Rather, my aim is to identify some of the key problems that led to the fall of the western empire, and see whether any of these might prove useful for understanding problems we face in the present. I shall begin with a brief historical overview of the key events in the fall of the Western Roman Empire, followed by an analysis of some of the critical factors I believe account for these processes, and then conclude with a discussion of what we can learn from this analysis that is of potential relevance to the present day.

Historical Background

Before the beginning of the Empire, the Roman state was administered by the Senate, which was an elite group of primarily landowning aristocrats, many of whom also served as generals and public officials. During the first century BC, a series of civil wars took place between powerful Roman generals seeking to consolidate their power. These civil wars culminated in the victory of Augustus over Mark Antony in 31 BC, following which Augustus progressively consolidated his extensive powers over the Roman state, becoming the first Roman Emperor in a constitutional structure called the Principate. The next two and a half centuries were the golden age of the Roman Empire, a period in which the empire reached its maximum geographic extent, trade and the arts flourished, and despite regular frontier wars and two brief civil wars, overall the empire enjoyed an unparalleled period of peace and prosperity.

This golden age came to an end in the AD 230s, giving way to a period of near continual civil war and repeated foreign invasion that is now called the Crisis of the Third Century. During this period, most emperors were no longer drawn from the old Roman senatorial aristocracy, but instead were military generals, often provincials born to commoner parents who rose through the ranks of the army. This disastrous period saw the empire almost collapse, with unity and peace eventually being restored was by a series of energetic emperors in the 270s and early 280s. Owing to the many constitutional and administrative reforms he enacted, the beginning of the reign of Diocletian in 284 is generally seen as the beginning of a new phase of the Roman Empire called the Dominate. These reforms included increasing the size of the state bureaucracy, instituting price controls and other strict economic regulations, increasing the size of the army, separation of military and civilian functions in the provinces, and most famously, his division of the empire into Western and Eastern halves, each with its own emperor. Another critical transformation of the late Roman period was the rise of Christianity, which came to prominence following the conversation of Constantine in 312, and culminated in the Edict of Thessalonica in 380, in which Christianity was made the official state religion. The reforms made by Diocletian and consolidated under Constantine helped to reinvigorate the empire, and led to a further century of relative peace and prosperity for most of the fourth century, at least in comparison to the disastrous third century.

The beginning of the end for the Western Empire came with a series of civil wars in the 380s and 390s between rival claimants to the throne. In addition to weakening the army, these wars also saw the rise of a series of Romanised but barbarian-born generals, who from the 390s to the end of the empire in the west were usually the power behind the throne, usurping real authority from a series of weak emperors, many of whom ascended to the throne as children. In the context of these military and political developments, the Western Empire faced a major catastrophe beginning in the winter of 405/406, when Vandals and other Germanic tribes crossed over the frozen Rhine and began to plunder Gaul (modern day France and Belgium), while at the same time the Goths invaded Italy from the east. With the Western Roman army unable to defend its frontiers, these barbarian groups were eventually placated by granting them land to settle in Spain and Gaul, nominally under Roman suzerainty but in practise largely independent. In the midst of this crisis, the Romans withdrew their legions from Britain, effectively abandoning the region. A further major blow came in 429, when the Vandals crossed into North Africa, attacking Roman settlements and pillaging the countryside. Gradually the Vandals expanded their territory, culminating in the 439 capture of the wealthy port city of Carthage. This was to prove disastrous for the faltering Western Roman Empire, which relied heavily on grain exports and tax revenue from its prosperous North African provinces.

The situation in the west worsened further when in the early 450s, a central Asian nomadic people called the Huns, under the leadership of their king Attila, invaded the empire and devastated large regions throughout Gaul and northern Italy. Although the Huns were eventually defeated, the Western Empire was by then reliant almost entirely on allied barbarian tribes, who after the defeat of the Huns resumed fighting against the Romans. By the 460s, little was left of the Western Empire besides Italy and some isolated residual lands in Gaul and Spain. The Western Emperors of this period were mostly puppets of barbarian generals with no real power of their own, and little influence or legitimacy outside of Italy. In 476, the incumbent Western child emperor Romulus Augustus was deposed by the barbarian General Odoacer, who instead of appointing a new puppet allowed the institution to lapse, and styled himself as the king of Italy. While this was not considered to be particularly important at the time, and Roman political institutions continued for about another century in Italy and for another millennium in the form of the Byzantine Empire in the east, this event is conventionally regarded as marking the end of the Western Roman Empire.

Analysis of Causes

Having presented the core historical narrative, I will now examine some of the fundamental reasons that explain the rapid decline and eventual collapse of the Western Roman Empire over the course of the fifth century. In the most straightforward sense, Roman control over territories outside of Italy itself was predominantly lost as a result of granting lands to invading barbarian groups, which subsequently established their own independent nations and began conquering further territories for themselves. Previously barbarian peoples had been allowed into the empire, but until the late fourth century this had always meant dispersal and integration into Roman society, not gaining control over territories while retaining their own distinct political structure and independent military forces. Why did the Roman authorities agree to such an obviously problematic arrangement? Again, the direct reason is fairly straightforward: they did so because the Roman armies were unable to prevent the barbarian tribes from entering the Empire, or expel them after entry, thus the best the Romans could do was to attempt to exercise some minimal control over the invading tribes by granting specific lands, and then utilising them as military allies when fighting against other barbarians. This leads to the question as to why the Western Roman army of the fifth century was unable to defend its frontiers against barbarian invasions, as it had successfully managed to do for centuries beforehand.

The answer to this question is not so straightforward. Entire books have been written on these issues, and so here I can only briefly summarise some of the major factors that I believe were most important. Unlike the barbarian tribes they faced, the Romans possessed a professional standing army, with standardised, centrally-issued equipment, and soldiers who were formally trained, paid a salary, and who served for a specified period of time. The two main requirements to sustain such an army were a sufficient source of tax revenues to pay for everything, and enough able-bodied recruits to fill the ranks. Ensuring that enough tax revenues and recruits were extracted from the provinces to sustain an army large enough to defend its frontiers was effectively the single most important role of the Roman imperial government. What is clear is that in the fifth century in the Western Empire, the institutions underpinning this system progressively broke down. Tax revenues and recruitment proved insufficient, the army diminished in size and proficiency, and as such the Romans had to rely more and more on external barbarian soldiers to fight for them.

What contributed to the declining effectiveness of Roman institutions? One crucial factor appears to be the decline in prosperity of the Western Roman Empire in the third and fourth centuries, when compared to the previous two centuries. Although evidence is limited, it appears that the upheavals of the third century, combined with rampant inflation and resulting disruption of the coinage system that occurred around the same time, significantly reduced commerce and trade, especially in the western part of the Empire. There was some revival during the fourth century, but not to the peak levels of the second century. Furthermore, major plagues in the second and third centuries significantly reduced the population of the empire, while some scholars also theorise that climactic changes reduced agricultural yields, both of which had the effect of reducing the tax base of the Roman economy. Furthermore, Diocletian’s reforms increased the size of both the bureaucracy and the military, which therefore required larger tax revenues to support. Overall the Roman economy became much more regulated, with the Edict on Maximum Prices of 301 attempting (unsuccessfully) to regulate prices and wages throughout the entire empire, and other legislation introduced attempted to force sons to follow the occupations of their fathers, effectively making a wide range of professions hereditary. The purpose of such regulations was to stabalise and regularise the Roman economy so that collection of tax revenues was made easier and more predictable. In the long term, however, such reforms likely contributed to the decline of commerce and reduced the vibrancy and efficiency of the economy.

All these changes notwithstanding, at the beginning of the fifth century the Western Roman Empire was still much larger and more prosperous than any of the barbarian tribes that would ultimately be the cause of its downfall. A crucial factor hampering the Empire from fully mobilising its available resources was the opposition of the wealthy Italian senatorial families, who by this time owned massive estates concentrated in Italy, Sicily, and the Mediterranean regions of Gaul and Spain. The senatorial aristocracy had always been powerful, with the process of consolidation of land ownership in the hands of fewer and fewer wealthy families dating back to the late Republic period. However this process seems to have accelerated in the third and fourth century, due to a combination of the decline in commerce which had previously supported regional cities, small landowners being forced to seek protection of larger landowners due to warfare, and taxation and economic restrictions forcing small independent farmers to sell out to larger estate owners. Large landholders were often able to gain exemptions from various kinds of taxes, and also had sufficient political power to restrict military recruitment from their estates, since the loss of able-bodied young men reduced the profitability of these estates. The problem of military recruitment in particular seems to have worsened over the course of the fourth century, with the minimum height of recruits being reduced in 367, and increasingly draconian punishments introduced for deserters and draft dodgers. Inability to raise and equip the required manpower seems therefore to be in large part due to the unwillingness of the Roman aristocracy to contribute their fair share of taxes and recruits.

Although it is easy enough to understand why landowners would attempt to avoid taxation and other restrictions on the profitability of their estates, it is also hard to imagine why the great estate holders did not do more to help oppose the barbarian invasions. After all, it is hardly profitable to have one’s estate confiscated, pillaged, or burned by invaders or raiding barbarian tribes, as occurred frequently during the fifth century. Unfortunately the evidence from this period is insufficient to provide any definitive resolutions to this puzzle, but one important factor appears to have been the loss of control of the Senatorial class over the army and imperial government during the Crisis of the Third Century. As noted previously, wealthy aristocrats from Italy were displaced from their traditional roles as generals and provincial governors, replaced by professional soldiers and educated administrators of common birth. While this contributed to an increase in professionalism of the roman military and bureaucracy, it also meant that the wealthy aristocrats who still exercised significant authority at a local level (particularly in the Western Empire) were increasingly ostracised from the imperial administration. Unlike during the Principate when the imperial government broadly served the interests of the Senatorial class, during the Dominate their interests increasingly diverged. By the late fifth century large landowners in Italy seem to have found it preferable to negotiate with the current barbarian invaders, rather than to cooperate with the rapidly disintegrating imperial administration. In short, it appears that the divergence of interests between the roman aristocracy and the imperial government during the fourth and fifth centuries resulted in the aristocrats ceasing to support the government when it needed them most.

Lessons for the Present

Having considered some of the major factors which may account for the fall of the Western Roman Empire in the fifth century, I now want to address what we might learn from this that is of relevance to the contemporary world. Drawing lessons from history is always difficult, since one is tempted to project current problems onto very different past social and economic situations. Nevertheless, I believe there are some general lessons to be learned. First, I believe that we can learn from the results of Diocletian’s economic and administrative reforms. While these reforms were intended to strengthen the Empire and probably did contribute to the revival of the fortunes of the Empire during the fourth century, in the long term these policies likely weakened the Empire’s tax base and contributed to the accumulation of land by large estate holders. The imperial government came progressively to be seen by many of its citizens, especially in the west, as more of a burden than a benefit. The Roman state’s single most important function was to provide physical security and protection from barbarian raids. When it failed to fulfil this function, many peoples in the west saw little value in continuing to support the government. Today the functions of governments are far more extensive than anything imaginable in the ancient world, but nevertheless it is still essential for a government to consider the long-term implications of its laws and institutions, and ensure that on balance it is providing value to its citizens. Naturally this is more easily said than done, and predicting the long-term effects of policy changes is difficult. Nevertheless the experience of Diocletian’s reforms illustrates how policies designed to resolve problems in the short-term can have potentially disastrous long-term consequences.

Second, the experience of the fall of the Western Roman Empire illustrates how important it is for educated elites to remain engaged in a nation’s political process. As discussed early, over the course of the fourth century the old senatorial aristocracy became increasingly distanced from the army and the imperial bureaucracy, and in large part ceased to be engaged in or supportive of measures to defend and reinvigorate the state. Many senators still made impassioned speeches about the importance of the empire and of defending traditional Roman values, but concrete actions to preserve the Empire, such as paying of taxes, rallying of troops from their estates to defend the frontiers, or supporting reform efforts, were in general noticeably lacking. Although today the circumstances are different, I believe we can learn from these events the importance of those with the education and the means to make a difference actually taking concrete actions and continuing to engage in the political process, instead of moaning about how bad things are whilst spending most of one’s time engaged in private pursuits. In the case of the Senatorial aristocracy this mostly meant attending to their vast estates; in our day and age we have different preoccupations and distractions, but nevertheless the lesson of the importance of concrete political engagement is still highly pertinent.

Third, I believe we can also learn from the Roman experience a lesson about the importance of taking a broader view beyond the latest crisis of the moment, and spending time and effort thinking about the long-running trends that are occurring and what might happen in the future if present trends continue. This is so important because nearly everything that happened in the Western Empire in the fifth century, including civil war, weak emperors, corruption, barbarian invasion, economic upheaval, had all happened before. Although many people were concerned about the way things were going, there was seldom a widespread recognition that such things foreshadowed the end of the Empire. The process of decline was for the most part very gradual; each year the army became slightly less effective, administration became somewhat less competent, cities fell into marginally worse repair, such that for most people nothing felt particularly out of the ordinary. Events that would have been shocking to Romans two or three generations previously came to be considered as normal. Observers certainly noticed that things weren’t going well for the Western Empire, but the decay was so slow that few could see the extent of the underlying problems, nor was it clear when the Empire reached a point of no return from which it could never recover. Thus, at some point in the late fifth century there came a time when there was no Western Roman Empire anymore. Few mourned its passing at the time, since the process had been so gradual that few even realised what had happened. Applying this lesson to the present, we should be wary of falling into a similar trap of failing to perceive the longer term trends as a result of being too focused in the minutia of day to day crises, and failing to consider cumulative changes over long periods of time that is easy to become accustomed to. It is hard to effectively deal with problems that we aren’t even aware of, and failure to take a longer-term perspective can lead to an inability to recognise vital trends which, in retrospect, seem extremely obvious.

A fourth and final point, related to the previous two points, is the importance of avoiding short-term thinking when tackling social and economic problems. When costs must be borne in the present and benefits are uncertain and in the future, there is a natural temptation to avoid paying those costs and hoping things will work out, or someone else will solve the problem. This appears to have been part of the attitude of the Roman aristocratic class, as multiple times even when Italy was threatened by barbarian invasion, they failed to exert sufficient effort and make the personal sacrifices that would have been necessary to save the Empire. At the time Senators may have reasoned that such extreme measures were not necessary, or that barbarian rule was really little different from Imperial rule, so long as the Senatorial class was allowed to retain control over their estates and maintain their dominance over local affairs. For a time this was the case, since the first barbarian conquerors of Italy did allow the Senatorial classes continued influence over local politics and continued enjoyment of the revenues of their estates. In the longer-term, however, the Senatorial class only survived the Empire by about a century, with many of their estates devastated and local political arrangements massively disrupted by wars which ravaged the Italian peninsula during the sixth century. The Roman Senate, once the institutional basis of power for the entire Mediterranean, ceased to exist in the early seventh century. Whilst many things become obvious in hindsight, nevertheless it seems plausible that had the Roman aristocracy of the fifth century been more forward thinking, they could have realised that it would be difficult to sustain their privileged position without a Roman army to defend them and a Roman state to justify their existence. Short term thinking, it seems, is an age old problem that we would do well to learn from.


The fall of the Western Roman Empire was a seminal event in world history, and understanding what caused a once large and powerful state to fall into ruins has interested scholars ever since. In this essay I discusses some of the critical factors that I believe contributed to the decline and fall of the Empire in the west, including declining effectiveness of the Roman army, economic downturn due to increased warfare and short-sighted economic policies, accumulation of wealth and power by large landowners, and a growing schism between the Senatorial class and the imperial government. I then considered what lessons might be applicable from these events that are relevant to the present. I identified four key lessons: the importance of considering the long-term effects of economic and political reforms, the essential role played by continued engagement of educated elites in shaping political outcomes for the public good, the necessity of paying attention to longer term trends rather than getting lost in the minutia, and the critical importance of avoiding short-termism when considering situations when we have to pay present costs to accrue benefits in the future.


BROWN, P. 1993. The World of Late Antiquity, London, Thames and Hudson.

CORNELL, T. & MATTHEWS, J. 1992. Atlas of the Roman world. New York; Oxford: Facts on File.

GOLDSWORTHY, A. K. 2003. The complete Roman army.

LIEBESCHUETZ, W. 1993. The end of the Roman army in the western empire. War and Society in the Roman World, 5, 265-276.

RETIEF, F. P. & CILLIERS, L. 2006. Causes of death among the Caesars (27 BC-AD 476). Acta Theologica, 26, 89-106.

TWINE, K. 1992. The City in Decline: Rome in Late Antiquity. Middle States Geographer, 25, 134-38.

WHITBY, M. 2003. Rome at War AD 293-696, Routledge.

Effective Altruism is an Ideology, not (just) a Question


In a widely-cited article on the EA forum, Helen Toner argues that effective altruism is a question, not an ideology. Here is her core argument:

What is the definition of Effective Altruism? What claims does it make? What do you have to believe or do, to be an Effective Altruist?

I don’t think that any of these questions make sense.

It’s not surprising that we ask them: if you asked those questions about feminism or secularism, Islamism or libertarianism, the answers you would get would be relevant and illuminating. Different proponents of the same movement might give you slightly different answers, but synthesising the answers of several people would give you a pretty good feeling for the core of the movement.

But each of these movements is answering a question. Should men and women be equal? (Yes.) What role should the church play in governance? (None.) What kind of government should we have? (One based on Islamic law.) How big a role should government play in people’s private lives? (A small one.)

Effective Altruism isn’t like this. Effective Altruism is asking a question, something like:

“How can I do the most good, with the resources available to me?”

In this essay I will argue that his view of effective altruism being a question and not an ideology is incorrect. In particular, I will argue that effective altruism is an ideology, meaning that it has particular (if somewhat vaguely defined) set of core principles and beliefs, and associated ways of viewing the world and interpreting evidence. After first explaining what I mean by ideology, I proceed to discuss the ways in which effective altruists typically express their ideology, including by privileging certain questions over others, applying particular theoretical frameworks to answer these questions, and privileging particular answers and viewpoints over others. I should emphasise at the outset that my purpose in this article is not to disparage effective altruism, but to try to strengthen the movement by helping EAs to better understand the intellectual actual intellectual underpinnings of the movement.

What is an ideology?

The first point I want to explain is what I mean when I talk about an ‘ideology’. Basically, an ideology is a constellation of beliefs and perspectives that shape the way adherents of that ideology view the world. To flesh this out a bit, I will present two examples of ideologies: feminism and libertarianism. Obviously these will be simplified since there is considerable heterogeneity within any ideology, and there are always disputes about who counts as a ‘true’ adherent of any ideology. Nevertheless, I think these quick sketches are broadly accurate and helpful for illustrating what I am talking about when I use the word ‘ideology’.

First consider feminism. Feminists typically begin with the premise that the social world is structured in such a manner that men as a group systematically oppress women as a group. There is a richly structured theory about how this works and how this interacts with different social institutions, including the family, the economy, the justice system, education, health care, and so on. In investigating any area, feminists typically focus on gendered power structures and how they shape social outcomes. When something happens, feminists ask ‘what affect does this have on the status and place of women in society?’ Given these perspectives, feminists typically are uninterested in and highly sceptical of any accounts of social differences between men and women based on biological differences, or attempts to rationalist differences on the basis of social stability or cohesion. This way of looking at things, focus on particular issues at the expense of others, and set of underlying assumptions constitutes the ideology of feminism.

Second consider libertarianism. Libertarians typically begin with the idea that individuals are fundamentally free and equal, but that governments throughout the world systematically step beyond their legitimate role of protecting individual freedoms by restricting those freedoms and violating individual rights. In analysing any situation, libertarians focus on how the actions of governments limit the free choices of individuals. Libertarians have extensive accounts as to how this occurs through taxation, government welfare programs, monetary and fiscal policy, the criminal justice system, state-sponsored education, the military industrial complex, and so on. When something happens, libertarians ask ‘what affect does this have on individual rights and freedoms?’ Given these perspectives, libertarians typically are uninterested in and highly sceptical of any attempts to justify state intervention on the basis of increases in efficiency, increasing equality, or improving social cohesion. This way of looking at things, focus on particular issues at the expense of others, and set of underlying assumptions constitutes the ideology of libertarianism.

Given the foregoing, here I summarise some of the key aspects of an ideology:

  1. Some questions are privileged over others.
  2. There are particular theoretical frameworks for answering questions and analysing situations.
  3. As a result of 1 and 2, certain viewpoints and answers to questions are privileged, while others are neglected as being uninteresting or implausible.

With this framework in mind of what an ideology is, I now want to apply this to the case of effective altruism. In doing so, I will consider each of these three aspects of an ideology in turn, and see how they relate to effective altruism.

Some questions are privileged over others

Effective altruism, according to Toner (and many others), asks a question something like ‘How can I do the most good, with the resources available to me?’. I agree that EA does indeed ask this question. However it doesn’t follow that EA isn’t an ideology, since as we have just seen, ideologies privilege some questions over others. In this case we can ask – what other similar questions could effective altruism ask? Here are a few that come to mind:

  • What moral duties do we have towards people in absolute poverty, animals in factory farms, or future generations?
  • What would a virtuous person do to help those in absolute poverty, animals in factory farms, or future generations?
  • What oppressive social systems are responsible for the most suffering in the world, and what can be done to dismantle them?
  • How should our social and political institutions be structured so as to properly represent the interests of all persons, or all sentient creatures?

I’ve written each with a different ethical theory in mind. In order these are: deontology, virtue ethics, Marxist/postcolonial/other critical theories, and contractarian ethics. While some readers may phrase these questions somewhat differently, my point is simply to emphasise that the question you ask depends upon your ideology.

Some EAs may be tempted to respond that all my examples are just different ways, or more specific ways, of asking the EA question ‘how can we do the most good’, but I think this is simply wrong. The EA question is the sort of question that a utilitarian would ask, and presupposes certain assumptions that are not shared by other ethical perspectives. These assumptions include things like: there is (in principle) some way of comparing the value of different causes, that it is of central importance to consider maximising the positive consequences of our actions, and that historical connections between us and those we might try to help are not of critical moral relevance in determining how to act. EAs asking this question need not necessarily explicitly believe all these assumptions, but I argue that in asking the EA question instead of other questions they could ask, they are implicitly relying upon tacit acceptance of these assumptions. To assert that these are beliefs shared by all other ideological frameworks is to simply ignore the differences between different ethical theories and the worldviews associated with them.

Particular theoretical frameworks are applied

In addition to the questions they ask, effective altruists tend to have a very particular approach to answering these questions. In particular, they tend to rely almost exclusively on experimental evidence, mathematical modelling, or highly abstract philosophical arguments. Other theoretical frameworks are generally not taken very seriously or simply ignored. Theoretical approaches that EAs tend to ignore include:

  • Sociological theory: potentially relevant to understanding causes of global poverty, how group dynamics operates and how social change occurs.
  • Ethnography: potentially highly useful in understanding causes of poverty, efficacy of interventions, how people make dietary choices regarding meat eating, the development of cultural norms in government or research organisations surrounding safety of new technologies, and other such questions, yet I have never heard of an EA organisation conducting this sort of analysis.
  • Phenomenology and existentialism: potentially relevant to determining the value of different types of life and what sort of society we should focus on creating.
  • Historical case studies: there is some use of these in the study of existential risk, mostly relating to nuclear war, but mostly this method is ignored as a potential source of information about social movements, improving society, and assessing the risk of catastrophic risks.
  • Regression analysis: potentially highly useful for analysing effective causes in global development, methods of political reform, or even the ability to influence AI or nuclear policy formation, but largely neglected in favour of either experiments or abstract theorising.

If readers disagree with my analysis, I would invite them to investigate the work published on EA websites, particularly research organisations like the Future of Humanity Institute and the Global Priorities Institute (among many others), and see what sorts of methodologies they utilise. Regression analysis and historical case studies are relatively rare, and the other three techniques I mention are virtually unheard of. This represents a very particular set of methodological choices about how to best go about answering the core EA question of how to do the most good.

Note that I am not taking a position on whether it is correct to privilege the types of evidence or methodologies that EA typically does. Rather, my point is simply that effective altruists seem to have very strong norms about what sorts of analysis is worthwhile doing, despite the fact that relatively little time is spent in the community discussing these issues. GiveWell does have a short discussion of their principles for assessing evidence, and there is a short section in the appendix of the GPI research agenda about harnessing and combining evidence, but overall the amount of time spent discussing these issues in the EA community is very small. I therefore content that these methodological choices are primarily the result of ideological preconceptions about how to go about answering questions, and not an extensive analysis of the pros and cons of different techniques.

Certain viewpoints and answers are privileged

Ostensibly, effective altruism seeks to answer the question ‘how to do the most good’ in a rigorous but open-minded way, without making ruling out any possibilities at the outset or making assumptions about what is effective without proper investigation. It seems to me, however, that this is simply not an accurate description of how the movement actually investigates causes. In practise, the movement seems heavily focused on the development and impacts of emerging technologies. Though not so pertinent in the case of global poverty, this is somewhat applicable in the case of animal welfare, given the increasing focus on the development of in vitro meat and plant-based meat substitutes. This technological focus is most evident in the focus on far future causes, since all of the main far future cause areas focused on by 80,000 hours and other key organisations (nuclear weapons, artificial intelligence, biosecurity, and nanotechnology) relate to new and emerging technologies. EA discussions also commonly feature discussion and speculation about the effects that anti-aging treatments, artificial intelligence, space travel, nanotechnology, and other speculative technologies are likely to have on human society in the long term future.

By itself the fact that EAs are highly focused on new technologies doesn’t prove that they privilege certain viewpoints and answers over others – maybe a wide range of potential cause areas have been considered, and many of the most promising causes just happen to relate to emerging technologies. However, from my perspective this does not appear to be the case. As evidence for this view, I will present as an illustration the common EA argument for focusing on AI safety, and then show that much the same argument could also be used to justify work on several other cause areas that have attracted essentially no attention from the EA community.

We can summarise the EA case for working on AI safety as follows, based on articles such as those from 80,000 hours and CEA (note this is an argument sketch and not a fully-fledged syllogism):

  • Most AI experts believe that AI with superhuman intelligence is certainly possible, and has nontrivial probability of arriving within the next few decades.
  • Many experts who have considered the problem have advanced plausible arguments for thinking that superhuman AI has the potential for highly negative outcomes (potentially even human extinction), but there are current actions we can take to reduce these risks.
  • Work on reducing the risks associated with superhuman AI is highly neglected.
  • Therefore, the expected impact of working on reducing AI risks is very high.

The three key aspects of this argument are expert belief in plausibility of the problem, very large impact of the problem if it does occur, and the problem being substantively neglected. My argument is that we can adapt this argument to make parallel arguments for other cause areas. I shall present three: overthrowing global capitalism, philosophy of religion, and resource depletion.

Overthrowing global capitalism

  • Many experts on politics and sociology believe that the institutions of global capitalism are responsible for extremely large amounts of suffering, oppression, and exploitation throughout the world.
  • Although there is much work criticising capitalism, work on devising and implementing practical alternatives to global capitalism is highly neglected.
  • Therefore, the expected impact of working on devising and implementing alternatives to global capitalism is very high.

Philosophy of religion

  • A sizeable minority of philosophers believe in the existence of God, and there are at least some very intelligent and educated philosophers are adherents of a wide range of different religions.
  • According to many religions, humans who do not adopt the correct beliefs and/or practices will be destined to an eternity (or at least a very long period) of suffering in this life or the next.
  • Although religious institutions have extensive resources, the amount of time and money dedicated to systematically analysing the evidence and arguments for and against different religious traditions is extremely small.
  • Therefore, the expected impact of working on investigating the evidence and arguments for the various religious is very high.

Resource depletion

  • Many scientists have expressed serious concern about the likely disastrous effects of population growth, ecological degradation, and resource depletion on the wellbeing of future generations and even the sustainability of human civilization as a whole.
  • Very little work has been conducted to determine how best to respond to resource depletion or degradation of the ecosystem so as to ensure that Earth remains inhabitable and human civilization is sustainable over the very long term.
  • Therefore, the expected impact of working on investigating long-term responses to resource depletion and ecological collapse is very high.

Readers may dispute the precise way I have formulated each of these arguments or exactly how closely they all parallel the case for AI safety, however I hope they will see the basic point I am trying to drive at. Specifically, if effective altruists are focused on AI safety essentially because of expert belief in plausibility, large scope of the problem, and neglectedness of the issue, a similar case can be made with respect to working on overthrowing global capitalism, conducting research to determine which religious belief (if any) is most likely to be correct, and efforts to develop and implement responses to resource depletion and ecological collapse.

One response that I foresee is that none of these causes are really neglected because there are plenty of people focused on overthrowing capitalism, researching religion, and working on environmentalist causes, while very few people work on AI safety. But remember, outsiders would likely say that AI safety is not really neglected because billions of dollars are invested into AI research by academics and tech companies around the world. The point is that there is a difference between working in a general area and working on the specific subset of that area that is highest impact and most neglected. In much the same way as AI safety research is neglected even if AI research more generally is not, likewise in the parallel cases I present, I argue that serious evidence-based research into the specific questions I present is highly neglected, even if the broader areas are not.

Potential alternative causes are neglected

I suspect that at this point many of my readers will at this point be mentally marshaling additional arguments as to why AI safety research is in fact a more worthy cause than the other three I have mentioned. Doubtless there are many such arguments that one could present, and probably I could devise counterarguments to at least some of them – and so the debate would progress. My point is not that the candidate causes I have presented actually are good causes for EAs to work on, or that there aren’t any good reasons why AI safety (along with other emerging technologies) is a better cause. My point is rather that these reasons are not generally discussed by EAs. That is, the arguments generally presented for focusing on AI safety as a cause area do not uniquely pick out AI safety (and other emerging technologies like nanotechnology or bioengineered pathogens), but EAs making the case for AI safety essentially never notice this because their ideological preconceptions bias them towards focusing on new technologies, and away from the sorts of causes I mention here. Of course EAs do go into much more detail about the risks of new technologies than I have here, but the core argument for focusing in AI safety in the first place is not applied to other potential cause areas to see if (as I think it does) it could also apply to those other causes.

Furthermore, it is not as if effective altruists have carefully considered these possible cause areas and come to the reasoned conclusion that they are not the highest priorities. Rather, they have simply not been considered. They have not even been on the radar, or at best barely on the radar. For example, I searched for ‘resource depletion’ on the EA forums and found nothing. I searched for ‘religion’ and found only the EA demographics survey and an article about whether EA and religious organisations can cooperate. A search for ‘socialism’ yielded one article discussing what is meant by ‘systemic change’, and one article (with no comments and only three upvotes) explicitly outlining an effective altruist plan for socialism.

This lack of interest in other cause areas can also be found in the major EA organisations. For example, the stated objective of the global priorities institute is:

To conduct foundational research that informs the decision-making of individuals and institutions seeking to do as much good as possible. We prioritise topics which are important, neglected, and tractable, and use the tools of multiple disciplines, especially philosophy and economics, to explore the issues at stake.

On the face of it this aim is consistent with all three of the suggested alternative cause areas I outlined in the previous section. Yet the GPI research agenda focuses almost entirely on technical issues in philosophy and economics pertaining to the long-termism paradigm. While AI safety is not discussed extensively it is mentioned a number of times, and much of the research agenda appears to be developed around related questions in philosophy and economics that the long-termism paradigm gives rise to. Religion and socialism are not mentioned at all in this document, while resource depletion is only mentioned indirectly by two references in the appendix under ‘indices involving environmental capital’.

Similarly the Future of Humanity Institute focuses on AI safety, AI governance, and biotechnology. Strangely, it also pursues some work on highly obscure topics such as the aestivation solution to the Fermi paradox and on the probability of Earth being destroyed by microscopic black holes or metastable vacuum states. At the same time, nothing about any of the potential new problem areas I have mentioned.

Under their problem profiles, 80,000 hours does not mention having investigated anything relating to religion or overthrowing global capitalism (or even substantially reforming global economic institutions). They do link to an article by Robert Wiblin discussing why EAs do not work on resource scarcity, however this is not a careful analysis or investigation, just his general views on the topic. Although I agree with some of the arguments he makes, the depth of analysis is very shallow relative to the potential risks and concern raised about this issue by many scientists and writers over the decades. Indeed, I would argue that there is about as much substance in this article as a rebuttal of resource depletion as a cause area as one finds in the typical article dismissing AI fears as exaggerated and hysterical.

In yet another example, the Foundational Research Institute states that:

Our mission is to identify cooperative and effective strategies to reduce involuntary suffering. We believe that in a complex world where the long-run consequences of our actions are highly uncertain, such an undertaking requires foundational research. Currently, our research focuses on reducing risks of dystopian futures in the context of emerging technologies. Together with others in the effective altruism community, we want careful ethical reflection to guide the future of our civilization to the greatest extent possible.

Hence, even though it seems that in principle socialists, Buddhists, and ecological activists (among others) are highly concerned about reducing the suffering of humans and animals, FRI ignores the topics that these groups would tend to focus on, and instead focuses their attention on the risks of emerging technologies. As in the case of FHI, they also seem to find room for some topics of highly dubious relevance to any of EAs goals, such as this paper about the potential for correlated actions with civilizations located elsewhere in the multiverse.

Outside of the main organisations, there has been some discussion about socialism as an EA cause, for example on r/EffectiveAltruism and by Jeff Kaufman. I was able to find little else about either of the two potential cause areas I outline.

Overall, on the basis of the foregoing examples I conclude that the amount of time and energy spent by the EA community investigating the three potential new cause areas that I have discussed is negligible compared to the time and energy spent investigating emerging technologies. This is despite the fact that most of these groups are not ostensibly established with the express purpose of reducing the harms of emerging technologies, but have simply chosen this cause area over other possibilities would that also potentially fulfill their broad objectives. I have not found any evidence that this choice is the result of early investigations demonstrating that emerging technologies are far superior to the cause areas I mention. Instead, it appears to be mostly the result of disinterest in the sorts of topics I identify, and a much greater ex ante interest in emerging technologies over other causes. I present this as evidence that the primary reason effective altruism focuses so extensively on emerging technologies over other speculative but potentially high impact causes, is because of the privileging of certain viewpoints and answers over others. This, in turn, is the result of the underlying ideological commitments of many effective altruists.

What is EA ideology?

If many effective altruists share a common ideology, then what is the content of this ideology? As with any social movement, this is difficult to specify with any precision and will obviously differ somewhat from person to person and from one organisation to another. That said, on the basis of my research and experiences in the movement, I would suggest the following core tenets of EA ideology:

  1. The natural world is all that exists, or at least all that should be of concern to us when deciding how to act. In particular, most EAs are highly dismissive of religious or other non-naturalistic worldviews, and tend to just assume without further discussion that views like dualism, reincarnation, or theism cannot be true. For example, the map of EA concepts has listed under ‘important general features of the world’ pages on ‘possibility of an infinite universe’ and ‘the simulation argument’, yet no mention of the possibility that anything could exist beyond the natural world. It requires a very particular ideological framework to regard the simulation as is more important or pressing than non-naturalism.
  2. The correct way to think about moral/ethical questions is through a utilitarian lens in which the focus is on maximising desired outcomes and minimising undesirable ones. We should focus on the effect of our actions on the margin, relative to the most likely counterfactual. There is some discussion of moral uncertainty, but outside of this deontological, virtue ethics, contractarian, and other approaches are rarely applied in philosophical discussion of EA issues. This marginalist, counterfactual, optimisation-based way of thinking is largely borrowed from neoclassical economics, and is not widely employed by many other disciplines or ideological perspectives (e.g. communitarianism).
  3. Rational behaviour is best understood through a Bayesian framework, incorporating key results from game theory, decision theory, and other formal approaches. Many of these concepts appear in the idealised decision making section of the map of EA concepts, and are widely applied in other EA writings.
  4. The best way to approach a problem is to think very abstractly about that problem, construct computational or mathematical models of the relevant problem area, and ultimately (if possible) test these models using experiments. The model appears to be of how research is approached in physics with some influence from analytic philosophy. The methodologies of other disciplines are largely ignored.
  5. The development and introduction of disruptive new technologies is a more fundamental and important driver of long-term change than socio-political reform or institutional change. This is clear from the overwhelming focus on technological change of top EA organisations, including 80,000 hours, the Center for Effective Altruism, the Future of Humanity Institute, the Global Priorities Project, the Future of Life Institute, the Centre for the Study of Existential Risk, and the Machine Intelligence Research Institute.

I’m sure others could devise different ways of describing EA ideology that potentially look quite different to mine, but this is my best guess based on what I have observed. I believe these tenets are generally held by EAs, particularly those working at the major EA organisations, but are generally not widely discussed or critiqued. That this set of assumptions is fairly specific to EA should be evident if one reads various criticisms of effective altruism from those outside the movement. Although they do not always express their concerns using the same language that I have, it is often clear that the fundamental reason for their disagreement is the rejection of one or more of the five points mentioned above.


My purpose in this article has not been to contend that effective altruists shouldn’t have an ideology, or that the current dominant EA ideology (as I have outlined it) is mistaken. In fact, my view is that we can’t really get anywhere in rational investigation without certain starting assumptions, and these starting assumptions constitute our ideology. It doesn’t follow from this that any ideology is equally justified, but how we adjudicate between different ideological frameworks is beyond the scope of this article.

Instead, all I have tried to do is argue that effective altruists do in fact have an ideology. This ideology leads them to privilege certain questions over others, to apply particular theoretical frameworks to the exclusion of others, and to focus on certain viewpoints and answers while largely ignoring others. I have attempted to substantiate my claims by showing how different ideological frameworks would ask different questions, use different theoretical frameworks, and arrive at different conclusions to those generally found within EA, especially the major EA organisations. In particular, I argued that the typical case for focusing on AI safety can be modified to serve as an argument for a number of other cause areas, all of which have been largely ignored by most EAs.

My view is that effective altruists should acknowledge that the movement as a whole does have an ideology. We should critically analyse this ideology, understand its strengths and weaknesses, and then to the extent to which we think this set of ideological beliefs is correct, defend it against rebuttals and competing ideological perspectives. This is essentially what all other ideologies do – it is how the exchange of ideas works. Effective altruists should engage critically in this ideological discussion, and not pretend they are aloof from it by resorting to the refrain that ‘EA is a question, not an ideology’.

A Case for Ethical Naturalism


In this article I will outline a brief case for ethical naturalism, which is the view that morality is real and arises purely from aspects of the natural world. My argument will proceed in three parts. First, I will attempt to provide some conceptual clarity by outlining what we mean when we talk about morality, focusing on what sort of thing morality is and what it would entail if it existed. Second, I will sketch an example of a naturalistic moral theory, specifically a theory of reductive moral naturalism outlined by Peter Railton. Third, I will consider some objections to ethical naturalism: motivational internalism, the triviality objection, and the problem of normativity. I will argue that these objections do not substantively undercut the case for ethical naturalism.

What is morality?

Before we can answer the question ‘does morality exist’, we must first determine what we are even talking about when we say ‘morality’. At its most fundamental, morality is a code of conduct for human behaviour which specifies some actions (and inactions, attitudes, motivations, etc) as appropriate or proper, and others as inappropriate or improper. Behaviours congruent with the moral code are praised, while those incongruent with it are condemned. According to moral universalism, there is one privileged code of conduct which is applicable everywhere, at all times and in all societies (at least among humans; here for simplicity I will leave aside issues of animal and machine ethics). This privileged code of conduct, which we might call the ‘correct morality’, need not involve any very specific norms, but might consist of quite general standards and principles which could then be applied differently in different societies depending on circumstances. So for example the ‘correct moral code’ might specify that it is wrong to kill human beings without a very good reason and without proper due process, but exactly what constitutes a good reason and due process may well depend on the social precise circumstances.

In asking whether morality exists, therefore, we are asking whether there is a single correct code of conduct for human behaviour that is applicable to all human societies, even if those societies were unaware of it or chose to ignore it. Traditionally many have identified this privileged code of conduct with God’s laws or commandments. Ethical naturalism, however, is the view that moral facts are natural facts about the physical world, and not the product of some divine injunction or transcendent cosmic principle. Absent some sort of creator or other privileged supernatural being, what else could give rise to, or could account for the existence of, a privileged code of human conduct? Ethical naturalists are those who believe that it is possible for such a privileged code of conduct to exist purely in the natural world. How could this be the case?

To answer this question we must first observe that there is more to morality than simply being a privileged moral code of conduct. Though the details vary, there is effectively universal agreement that the privileged code of conduct that is morality necessarily promotes pro-social cooperation of people within a society, proscribes various behaviours that are detrimental to the self and to others, and promotes fairness and equity. Of course there is sharp disagreement about how to understand notions like ‘harm’ and ‘fairness’, but the point here is simply that the universal code of conduct referred to by morality, if such as thing exists, must relate in some central way to reducing harm and promoting fairness and equity. We could imagine other universal codes of conduct, but I argue they would not be a moral code of conduct, since our conception of morality necessarily and intrinsically includes these notions. This constraint is important, because it provides sufficient detail to begin constructing an account as to how morality could exist in a naturalistic universe.

A theory of ethical naturalism

Armed with a basic conception as to what we mean when we talk about morality, we are now ready to outline a theory as to how ethnical naturalism can account for the existence of moral facts. Here I will present only one of the many theories that have been developed, that propounded by philosopher Peter Railton. The details of his theory are not the major focus of this article, so I will offer only the briefest outline. The key idea is that moral facts derive from what would maximise the fulfilment of idealised preferences. An idealised preference is not what somebody actually wants, but what they would want themselves to want, if they had access to full information and were perfectly rational. This is important because people can want things that are bad for them (e.g. wanting to smoke). Railton’s account also holds that moral facts refer to what would maximally satisfy the sum of idealised preferences aggregated over all individuals, treating each person equally. Even though the idealised preferences of any given person may be purely selfish, the satisfaction of idealised preferences across all people necessarily incorporates the wellbeing of all persons, and thereby provides a basis for moral facts. Thus, according to Railton’s theory, an action is morally good to the degree to which it contributes to satisfying the sum of everyone’s idealised preferences, treating each person equally.

Railton’s account provides a naturalistic theory, because facts about idealised preferences are natural facts – they relate to things in the natural world (specifically idealised human desires about promoting human welfare), as opposed to divine commands, logical abstractions, or metaphysical principles. It is also very clearly a moral theory, since it provides (in outline form) a code of conduct for human behaviour relating in its very essence to human welfare and action in social contexts. Though I lack the space to make the argument here, I also believe that Railton’s theory of morality as maximising the fulfilment of idealised preferences provides an account of the universal code of social conduct that best fits our antecedent notion of what such a code we are looking for. That is, compared to any rival accounts, Railton’s best fits what we mean when we think and talk about morality. As such, just as in science we accept the theory that best accounts for and explains the available data, so too should we regard Railton’s theory as privileged over others, and at least as approximately describing the ‘correct morality’.

Parfit argues against idealised preference theories of morality on the basis that they provide no constraints on what people’s idealised preferences could be. He uses the example of an anorexic girl who, even after full reflection and access to all relevant information, could conceivably still decide that her idealised preference is for her to starve herself to death. According to Railton’s theory, this would mean that this would then be what is non-morally good for her. I personally do not consider this to be a strong objection to idealised preference theories of morality. This is because the idea that anyone who was fully rational and had access to all pertinent information regarding ways of living would still decide that their idealised preference would be to starve themselves to death is, in my view, totally absurd. This is precisely why conditions such as anorexia and depression are rightly regarded as mental disorders, involving false beliefs and a variety of cognitive distortions. If we begin with an assumption that even idealised fully rational, fully informed versions of ourselves would be subject to such defective views, then it does not surprise me that we arrive at absurd conclusions. I see this as stemming from the absurd initial assumption, and not from any particular flaw in Railton’s theory.

Motivational internalism objection

There are three major lines of objection often raised against theories of ethical naturalism. The first such objection comes from adherents of a view called motivational internalism, who argue that motivation is an essential component of morality. According to this view, a belief in some moral fact necessarily involves a motivation to act in accordance with that fact. For example, to believe that it is wrong to eat animals would necessarily entail a motivation not to eat animals. Perhaps that motivation would be overwhelmed by a stronger motivation, but nevertheless it must exist in some form. Anyone who did not possess such a motivation would, under this view, not truly believe that it was wrong to eat animals, even though they may protest to such a belief.

The reason why motivational internalism poses a problem is because it seems hard to fit into a naturalistic worldview. This is because under moral naturalism moral facts are simply natural facts, and it does not seem like natural facts are the sorts of things that necessarily lead to any particular motivation. We can believe all sorts of things about the solar system, the human body, societies, physics, etc, without any necessary motivational state being attached to or following from this. This idea that beliefs are never sufficient for producing motivations to act is known as the Humean theory of motivation, after its main populariser David Hume. According to Hume, for a motivation to exist we must have both some sort of antecedent desire, plus a belief that acting in a certain way would satisfy that desire. Mere belief itself is never sufficient to produce motivation. If we accept both motivational internalism and the Humean theory of motivation, it follows that when people come to believe in a moral fact, that belief always produces a relevant desire to act. Many philosophers regard this as hard to fit into a naturalistic worldview, as there just don’t seem to be any facts about the natural world that necessarily produce desires in this way.

My response to this issue is to simply reject motivational internalism is being too strong and too demanding a view. After all, why should we think that moral beliefs necessarily imply or generate a corresponding motivation? This does not appear to be the case empirically, since there appears to be strong evidence for the existence of psychopaths who know about morality but remain unmotivated to act in accordance with it. Furthermore, it also seems to be the case that the principle fails to apply to other fields of enquiry. We can, for instance, imagine recalcitrant persons who agree that an argument is sound and fail to identify any logical mistake with it, but nevertheless still have no motivation to accept the conclusion of the argument as true (indeed, many of us have likely participated in discussions where this behaviour has manifested!) Given these considerations, I do not see any strong reason to accept motivational internalism, and as such the failure of natural moral facts to necessarily supply any motivation to act in accordance with them does nothing to undermine the case for ethical naturalism.

The triviality objection

Parfit raises a second objection to ethical naturalism, which he calls the ‘triviality objection’. According to this argument, it is impossible for ethical naturalists to simultaneously argue that ethical facts are natural facts, and also maintain that this is a substantive, informative claim, more than just a mere tautology. The example that he uses considers two properties: the natural property ‘maximises happiness’, and the moral property ‘is what we morally ought to do’. Ethical naturalists argue that a property something like ‘maximises happiness’ is the same as the property ‘is what we morally ought to do’. Yet according to Parfit, these two properties cannot simply be identical, as otherwise there would not be two properties but one, and we would essentially just be saying ‘property A is property A’, which is an uninformative tautology. To be a substantive claim, ethical naturalists must instead be saying ‘property A is property B’, but Parfit doesn’t think this makes any sense, since there must be something to distinguish the two properties for them to be different. He gives the example of water and H20, arguing that although water and H20 are the same substance, the property ‘is comprised of two hydrogen atoms and one oxygen atom’ is not the same as the property ‘is a clear substance that falls from the sky and we need to drink to survive’. These two properties might be satisfied by the same stuff, but the properties themselves are not the same. Likewise, Parfit argues that states of affairs can possess both natural and moral properties, but these properties will always and necessarily be distinct and different properties, not the same as the ethical naturalist claims. Parfit thinks this is an argument for irreducible moral properties, which cannot be reduced to or equated with natural properties.

The flaw in Parfit’s argument, in my view, is that he does not articulate what it means for two apparently distinct properties to be the same. We can address the apparent paradox by appealing to the distinction between sense (the internal psychological meaning of a phrase) and reference (the thing in the real world picked out by a phrase). The classic example of this is that of the ‘morning star’ (a star that is visible in the east just before sunrise), and the ‘evening star’ (a star that is visible in the west just after sunset). Although the phrases ‘the morning star’ and ‘the evening star’ do not have the same meaning, they in practise refer to the same thing, namely the planet Venus. We can apply this example to response to Parfit’s concern. The natural property ‘maximises happiness’ does not mean the same thing as the non-natural property ‘is morally good’. However, it turns out that both properties are ‘the same’, by which I mean that:

  • All pertaining states of affairs with property A also have property B
  • There are facts that account for this coincidence of properties, so it is not simply an accident

I argue that this is what reductive naturalists mean when they say that the property ‘is morally good’ is the same thing as the property ‘is morally good’. Understood in this way, there is no paradox about how two properties can be the same and yet different. They are different in that they mean different things (have different senses), but are the same in that they have the same referent (the properties are fulfilled by the same states of affairs and only those states of affairs). The reductive naturalist’s claim that the two properties are the same is therefore substantial, and not trivial. As such, I believe that Parfit’s triviality objection fails.

The problem of normativity

The third objection to ethical naturalism, and probably the most important, is the problem of normativity; relating to the ‘binding force’ of morality. Philosophers typically understand this to mean that all persons have reasons to act morally, even if they may be unaware of those reasons or choose to reject them. This idea is called moral rationalism, the view that if it is wrong for somebody to do a particular act, then there is always a reason for them not to do that action. We would, for instance, typically accept the claim that smokers have a reason to quit smoking (namely the health benefits), even if they are unaware of those benefits or simply choose to ignore them. Likewise, it is argued that all persons have reasons not to kill, steal, etc, even if they fail to act in accordance with those reasons. As with motivational internalism, many thinkers have regarded normativity as a problem for ethical naturalism, on the grounds that natural facts simply are not the sorts of things that necessarily give rise to any particular reason to act. This is related to J. L. Mackie’s famous ‘argument from queerness’, in which he argued against the existence of moral facts on the basis that they would need to posses some queer property of being intrinsically motivating, or have an ‘ought-to-be-doneness’ about them. This sort of property seems hard to fit into a naturalistic worldview, because natural facts simply describe the way things are in the world. The way things are, however, doesn’t impose any obligations on us, or provide any reasons to act one way or the other. According to this argument the ethnical naturalist is, therefore, unable to account for the normativity of morality.

The challenge, then, comes down to what can be said in answer to the amoralist? This is a person who recognises the existence of facts about idealised human preferences  (in line with Railton’s account), and perhaps even agrees that this provides the best account of a code of conduct pertinent to moral issues, but nevertheless demands to know why they have a reason to act in accordance with this code. In my view, the best response to this challenge is to argue that it is a basic, foundational principle that in any domain, rational agents have reasons to act in accordance with the privileged code of conduct (if any) pertinent to that domain. To understand how this answer works, suppose somebody were presented with a mathematical proof which they understood and followed in every stage, but then simply refused to agree that they had any reason to accept the conclusion. We could imagine them retorting that they have no internal desire or motivation to form accurate beliefs about this mathematical question, and therefore reject the idea that they have a reason to accept the conclusion of this proof. I would content, however, that regardless of their particular desires, the person in question still has a reason to accept the conclusion of the proof, because that is consistent with the privileged norms of rational inference governing the pertinent domain (in this case mathematics). To take another example, we would typically say that anyone engaged in a game of chess would have a reason to make a move that would help them to win the game – even if they did not feel any desire to make the move, and even if they didn’t care about winning the game at all. The fact that the person is engaged in the domain of chess means that there is a privileged code of conduct pertinent to that domain which provides them with some reasons for action, irrespective of their desires or motivations. The recalcitrant person in such cases is unlikely to be persuaded by this argument, but it seems to me that it correctly describes the reason they have for accepting the proof or making the chess move.

I argue much the same thing applies for morality. The domain of conduct pertinent to morality is that of living and interacting with other people. Unlike a game of chess or a mathematical proof, there is no real way to avoid engaging in the moral domain. Even if we become a hermit and cut off contact from others, there would still be other people in the world who would come under the remit of the domain of morality (since it applies universally across all people). People may fail to be motivated to act in accordance with what will best promote the wellbeing of themselves and others, but nevertheless they have a reason to act in accordance with this code of conduct, since that is the privileged code of conduct for the domain of living and interacting with other people. This might not be a very satisfying answer to the problem of normativity, but this will be an issue in any domain, since regardless of what reasons have been given for doing something or believing something, I can always still press the question ‘but why?’ Our chain of justifications has to stop somewhere, and I believe it is reasonable to affirm as fundamental the principle that everyone has a domain-specific reason to act in accordance with the privileged code of conduct pertinent to a given domain.

Thus, since everyone has a body that can be healthy or diseased, everyone has a health-related reason to stop smoking even if they don’t care about their health. We would not accept that a person who doesn’t care about dying of lung cancer actually has no reason to quit smoking; rather we would instead say that they are not motivated to act in accordance with this reason. Since (almost) everyone has some sort of money or property, (almost) everyone has a finance-related reason to save and invest their money wisely even if they don’t care about money. Since everyone lives in some sort of society with conventions of how to behave, everyone has an etiquette-related reason to obey their society’s rules of etiquette even if they don’t care about being polite. Likewise, since everyone lives on a planet where their actions potentially affect other people, everyone has a morality-related reason to act morally, even though they may sometimes fail to be motivated by these reasons. This is what provides the basis for the normativity of morality.

Glossary of some key terms

  • Moral realism: there is a privileged, universal code of conduct governing human action, which gives rise to moral facts
  • Reasons internalism: having a reason to do something implies having a motivation to do that thing
  • Humean theory of motivation: beliefs are insufficient for motivation, need antecedent desire too
  • Moral rationalism: if something is morally wrong then there must be a reason not to do it
  • Ethical naturalism: moral facts are natural facts
  • Reasons internalism + Humean theory: having a reason to do something implies a belief and an antecedent desire
  • Reasons internalism + Humean theory + moral rationalism: to believe that something is wrong must necessarily produce or elicit a desire not to do that thing
  • Reasons internalism + Humean theory + moral rationalism + ethical naturalism: there are natural facts, belief in which necessarily produces or elicits a desire to act in accordance with the universal code of conduct which governs human action = Mackie’s ‘categorically prescriptive facts’: facts that provide reasons for action independent of our desires

A Critique of Superintelligence


In this article I present a critique of Nick Bostrom’s book Superintelligence. For purposes of brevity I shall not devote much space to summarising Bostrom’s arguments or defining all the terms that he uses. Though I briefly review each key idea before discussing it, I shall also assume that readers have some general idea of Bostrom’s argument, and some of the key terms involved. Also note that to keep this piece focused, I only discuss arguments raised in this book, and not what Bostrom has written elsewhere or others who have addressed similar issues. The structure of this article is as follows. I first offer a summary of what I regard to be the core argument of Bostrom’s book, outlining a series of premises that he defends in various chapters. Following this summary, I commence a general discussion and critique of Bostrom’s concept of ‘intelligence’, arguing that his failure to adopt a single, consistent usage of this concept in his book fatally undermines his core argument. The remaining sections of this article then draw upon this discussion of the concept of intelligence in responding to each of the key premises of Bostrom’s argument. I conclude with a summary of the strengths and weaknesses of Bostrom’s argument.

Summary of Bostrom’s Argument

Throughout much of his book, Bostrom remains quite vague as to exactly what argument he is making, or indeed whether he is making a specific argument at all. In many chapters he presents what are essentially lists of various concepts, categories, or considerations, and then articulates some thoughts about them. Exactly what conclusion we are supposed to draw from his discussion is often not made explicit. Nevertheless, by my reading the book does at least implicitly present a very clear argument, which bears a strong similarity to the sorts of arguments commonly found in the Effective Altruism (EA) movement, in favour of focusing on AI research as a cause area. In order to provide structure for my review, I have therefore constructed an explicit formulation of what I take to be Bostrom’s main argument in his book. I summarise it as follows:

Premise 1: A superintelligence, defined as a system that ‘exceeds the cognitive performance of humans in virtually all domains of interest’, is likely to be developed in the foreseeable future (decades to centuries).

Premise 2: If superintelligence is developed, some superintelligent agent is likely to acquire a decisive strategic advantage, meaning that no terrestrial power or powers would be able to prevent it doing as it pleased.

Premise 3: A superintelligence with a decisive strategic advantage would be likely to capture all or most of the cosmic endowment (the total space and resources within the accessible universe), and put it to use for its own purposes.

Premise 4: A superintelligence which captures the cosmic endowment would likely put this endowment to uses incongruent with our (human) values and desires.

Preliminary conclusion: In the foreseeable future it is likely that a superintelligent agent will be created which will capture the cosmic endowment and put it to uses incongruent with our values. (I call this the AI Doom Scenario).

Premise 5: Pursuit of work on AI safety has a non-trivial chance of noticeably reducing the probability of the AI Doom Scenario occurring.

Premise 6: If pursuit of work on AI safety has at least a non-trivial chance of noticeably reducing the probability of an AI Doom Scenario, then (given the preliminary conclusion above) the expected value of such work is exceptionally high.

Premise 7: It is morally best for the EA community to preferentially direct a large fraction of its marginal resources (including money and talent) to the cause area with highest expected value.

Main conclusion: It is morally best for the EA community to direct a large fraction of its marginal resources to work on AI safety. (I call this the AI Safety Thesis.)

Bostrom discusses the first premise in chapters 1-2, the second premise in chapters 3-6, the third premise in chapters 6-7, the fourth premise in chapters 8-9, and some aspects of the fifth premise in chapters 13-14. The sixth and seventh premises are not really discussed in the book (though some aspects of them are hinted at in chapter 15), but are widely discussed in the EA community and serve as the link between the abstract argumentation and real-world action, and as such I decided also to discuss them here for completeness. Many of these premises could be articulated slightly differently, and perhaps Bostrom would prefer to rephrase them in various ways. Nevertheless I hope that they at least adequately capture the general thrust and key contours of Bostrom’s argument, as well as how it is typically appealed to and articulated within the EA community.

The nature of intelligence

In my view, the biggest problem with Bostrom’s argument in Superintelligence is his failure to devote any substantial space to discussing the nature or definition of intelligence. Indeed, throughout the book I believe Bostrom uses three quite different conceptions of intelligence:

  • Intelligence(1): Intelligence as being able to perform most or all of the cognitive tasks that humans can perform. (See page 22)
  • Intelligence(2): Intelligence as a measurable quantity along a single dimension, which represents some sort of general cognitive efficaciousness. (See pages 70,76)
  • Intelligence(3): Intelligence as skill at prediction, planning, and means-ends reasoning in general. (See page 107)

While certainly not entirely unrelated, these three conceptions are all quite different from each other. Intelligence(1) is mostly naturally viewed as a multidimensional construct, since humans exhibit a wide range of cognitive abilities and it is by no means clear that they are all reducible to a single underlying phenomenon that can be meaningfully quantified with one number. It seems much more plausible to say that the range of human cognitive abilities require many different skills which are sometimes mutually-supportive, sometimes mostly unrelated, and sometimes mutually-inhibitory in varying ways and to varying degrees. This first conception of intelligence is also explicitly anthropocentric, unlike the other two conceptions which make no reference to human abilities. Intelligence(2) is unidimensional and quantitative, and also extremely abstract, in that it does not refer directly to any particular skills or abilities. It most closely parallels the notion of IQ or other similar operational measures of human intelligence (which Bostrom even mentions in his discussion), in that it is explicitly quantitative and attempts to reduce abstract reasoning abilities to a number along a single dimension. Intelligence(3) is much more specific and grounded than either of the other two, relating only to particular types of abilities. That said, it is not obviously subject to simple quantification along a single dimension as is the case for Intelligence(2), nor is it clear that skill at prediction and planning is what is measured by the quantitative concept of Intelligence(2). Certainly Intelligence(3) and Intelligence(2) cannot be equivalent if Intelligence(2) is even somewhat analogous to IQ, since IQ mostly measures skills at mathematical, spatial, and verbal memory and reasoning, which are quite different from skills at prediction and planning (consider for example the phenomenon of autistic savants). Intelligence(3) is also far more narrow in scope than Intelligence(1), corresponding to only one of the many human cognitive abilities.

Repeatedly throughout the book, Bostrom flips between using one or another of these conceptions of intelligence. This is a major weakness for Bostrom’s overall argument, since in order for the argument to be sound it is necessary for a single conception of intelligence to be adopted and apply in all of his premises. In the following paragraphs I outline several of the clearest examples of how Bostrom’s equivocation in the meaning of ‘intelligence’ undermines his argument.

Bostrom argues that once a machine becomes more intelligent than a human, it would far exceed human-level intelligence very rapidly, because one human cognitive ability is that of building and improving AIs, and so any superintelligence would also be better at this task than humans. This means that the superintelligence would be able to improve its own intelligence, thereby further improving its own ability to improve its own intelligence, and so on, the end result being a process of exponentially increasing recursive self-improvement. Although compelling on the surface, this argument relies on switching between the concepts of Intelligence(1) and Intelligence(2). When Bostrom argues that a superintelligence would necessarily be better at improving AIs than humans because AI-building is a cognitive ability, he is appealing to Intelligence(1). However, when he argues that this would result in recursive self-improvement leading to exponential growth in intelligence, he is appealing to Intelligence(2). To see how these two arguments rest on different conceptions of intelligence, note that considering Intelligence(1), it is not at all clear that there is any general, single way to increase this form of intelligence, as Intelligence(1) incorporates a wide range of disparate skills and abilities that may be quite independent of each other. As such, even a superintelligence that was better than humans at improving AIs would not necessarily be able to engage in rapidly recursive self-improvement of Intelligence(1), because there may well be no such thing as a single variable or quantity called ‘intelligence’ that is directly associated with AI-improving ability. Rather, there may be a host of associated but distinct abilities and capabilities that each needs to be enhanced and adapted in the right way (and in the right relative balance) in order to get better at designing AIs. Only by assuming a unidimensional quantitative conception of Intelligence(2) does it make sense to talk about the rate of improvement of a superintelligence being proportional to its current level of intelligence, which then leads to exponential growth. Bostrom therefore faces a dilemma. If intelligence is a mix of a wide range of distinct abilities as in Intelligence(1), there is no reason to think it can be ‘increased’ in the rapidly self-reinforcing way Bostrom speaks about (in mathematical terms, there is no single variable  which we can differentiate and plug into the differential equation, as Bostrom does in his example on pages 75-76). On the other hand, if intelligence is a unidimensional quantitative measure of general cognitive efficaciousness, it may be meaningful to speak of self-reinforcing exponential growth, but it is not necessarily obvious that any arbitrary intelligent system or agent would be particularly good at designing AIs. Intelligence(2) may well help with this ability, but it’s not at all clear it is sufficient – after all, we readily conceive of building a highly “intelligent” machine that can reason abstractly and pass IQ tests etc, but is useless at building better AIs.

Bostrom argues that once a machine intelligence became more intelligent than humans, it would soon be able to develop a series of ‘cognitive superpowers’ (intelligence amplification, strategising, social manipulation, hacking, technology research, and economic productivity), which would then enable it to escape whatever constraints were placed upon it and likely achieve a decisive strategic advantage. The problem is that it is unclear whether a machine endowed only with Intelligence(3) (skill at prediction and means-ends reasoning) would necessarily be able to develop skills as diverse as general scientific research ability, the capability to competently use natural language, and perform social manipulation of human beings. Again, means-ends reasoning may help with these skills, but clearly they require much more beyond this. Only if we are assuming the conception of Intelligence(1), whereby the AI has already exceeded essentially all human cognitive abilities, does it become reasonable to assume that all of these ‘superpowers’ would be attainable.

According to the orthogonality thesis, there is no reason why the machine intelligence could not have extremely reductionist goals such as maximising the number of paperclips in the universe, since an AI’s level of intelligence is totally separate to and distinct from its final goals. Bostrom’s argument for this thesis, however, clearly depends adopting Intelligence(3), whereby intelligence is regarded as general skill with prediction and means-ends reasoning. It is indeed plausible that an agent endowed only with this form of intelligence would not necessarily have the ability or inclination to question or modify its goals, even if they are extremely reductionist or what any human would regard as patently absurd. If, however, we adopt the much more expansive conception of Intelligence(1), the argument becomes much less defensible. This should become clear if one considers that ‘essentially all human cognitive abilities’ includes such activities as pondering moral dilemmas, reflecting on the meaning of life, analysing and producing sophisticated literature, formulating arguments about what constitutes a ‘good life’, interpreting and writing poetry, forming social connections with others, and critically introspecting upon one’s own goals and desires. To me it seems extraordinarily unlikely that any agent capable of performing all these tasks with a high degree of proficiency would simultaneously stand firm in its conviction that the only goal it had reasons to pursue was tilling the universe with paperclips. As such, Bostrom is driven by his cognitive superpowers argument to adopt the broad notion of intelligence seen in Intelligence(1), but then is driven back to a much narrower Intelligence(3) when he wishes to defend the orthogonality thesis. The key point to be made here is that the goals or preferences of a rational agent are subject to rational reflection and reconsideration, and the exercise of reason in turn is shaped by the agent’s preferences and goals. Short of radically redefining what we mean by ‘intelligence’ and ‘motivation’, this complex interaction will always hamper simplistic attempts to neatly separate them, thereby undermining Bostrom’s case for the orthogonality thesis – unless a very narrow conception of intelligence is adopted.

In the table below I summarise several of the key outcomes or developments that are critical to Bostrom’s argument, and how plausible they would be under each of the three conceptions of intelligence. Obviously such judgements are necessarily vague and subjective, but the key point I wish to make is simply that only by appealing to different conceptions of intelligence in different cases is Bostrom able to argue that all of the outcomes are reasonably likely to occur. Fatally for his argument, there is no single conception of intelligence that makes all of these outcomes simultaneously likely or plausible.

Outcome Intelligence(1):        all human cognitive abilities Intelligence(2): unidimensional measure of cognition Intelligence(3): prediction and means-ends reasoning
Quick takeoff Highly unlikely Likely Unclear
Develops all cognitive superpowers Highly likely Highly unlikely Highly unlikely
Absurd ‘paperclip maximising’ goals Extremely unlikely Unclear Likely
Resists changes to goals Unlikely Unclear Likely
Can escape confinement Likely Unlikely Unlikely

Premise 1: Superintelligence is coming soon

I have very little to say about this premise, since I am in broad agreement with Bostrom that even if it takes decades or a century, super-human artificial intelligence is quite likely to be developed. I find Bostrom’s appeals to surveys of AI researchers regarding how long it is likely to be until human level AI is developed fairly unpersuasive, given both the poor track record of such predictions and also the fact that experts on AI research are not necessarily experts on extrapolating the rate of technological and scientific progress (even in their own field). Bostrom, however, does note some of these limitations, and I do not think his argument is particularly dependent upon these sorts of appeals. I therefore will pass over premise 1 and move on to what I consider to be the more important issues.

Premise 2: Arguments against a fast takeoff

Bostrom’s major argument in favour of the contention that a superintelligence would be able to gain a decisive strategic advantage is that the ‘takeoff’ for such an intelligence would likely be very rapid. By a ‘fast takeoff’, Bostrom means that the time between when the superintelligence first approaches human-level cognition and when it achieves dramatically superhuman intelligence would be small, on the order of days or even hours. This is critical because if takeoff is as rapid as this, there will be effectively no time for any existing technologies or institutions to impede the growth of the superintelligence or check it in any meaningful way. Its rate of development would be so rapid that it would readily be able to out-think and out-maneuver all possible obstacles, and rapidly obtain a decisive strategic advantage. Once in this position, the superintelligence would possess an overwhelming advantage in technology and resources, and would therefore be effectively impossible to displace.

The main problem with all of Bostrom’s arguments for the plausibility of a fast takeoff is that they are fundamentally circular, in that the scenario or consideration they propose is only plausible or relevant under the assumption that the takeoff (or some key aspect of it) is fast. The arguments he presents are as follows:

  • Two subsystems argument: if an AI consists of two or more subsystems with one improving rapidly, but only contributing to the ability of the overall system after a certain threshold is reached, then the rate of increase in the performance of the overall system could drastically increase once that initial threshold is passed. This argument assumes what it is trying to prove, namely that the rate of progress in a critical rate-limiting subsystem could be very rapid, experiencing substantial gains on the order of days or even hours. It is hard to see what Bostrom’s scenario really adds here; all he has done is redescribed the fast takeoff scenario in a slightly more specific way. He has not given any reason for thinking that it is at all probable that progress on such a critical rate-limiting subsystem would occur at the extremely rapid pace characteristic of a fast takeoff.
  • Intelligence spectrum argument: Bostrom argues that the intelligence gap between ‘infra-idiot’ and ‘ultra-Einstein’, while appearing very large to us, may actually be quite small in the overall scheme of the spectrum of possible levels of intelligence, and as such the time taken to improve an AI through and beyond this level may be much less than it originally seems. However, even if it is the case that the range of the intelligence spectrum within which all humans fall is fairly narrow in the grand scheme of things, it does not follow that the time taken to traverse it in terms of AI development is likely be on the order of days or weeks. Bostrom is simply making an assumption that such rapid rates of progress could occur. His intelligence spectrum argument can only ever show that the relative distance in intelligence space is small; it is silent with respect to likely development timespans.
  • Content overhang argument: an artificial intelligence could be developed with high capabilities but with little raw data or content to work with. If large quantities of raw data could be processed quickly, such an AI could rapidly expand its capabilities. The problem with this argument is that what is most important is not how long it takes a given AI to absorb some quantity of data, but rather the length of time between producing one version of the AI and the next, more capable version. This is because the key problem is that we currently don’t know how to build a superintelligence. Bostrom is arguing that if we did build a nascent superintelligence that simply needed to process lots of data to manifest its capabilities, then this learning phase could occur quickly. He gives no reason, however, to think that the rate at which we can learn how to build that nascent superintelligence (in other words, the overall rate of progress in AI research) will be anything like as fast as the rate an existing nascent superintelligence would be able to process data. Only if we assume rapid breakthroughs in AI design itself does the ability of AIs to rapidly assimilate large quantities of data become relevant.
  • Hardware overhang argument: it may be possible to increase the capabilities of a nascent superintelligence dramatically and very quickly by rapidly increasing the scale and performance of the hardware it had access to. While theoretically possible, this is an implausible scenario since any artificial intelligence showing promise would likely be operating near the peak of plausible hardware provision. This means that testing, parameter optimisation, and other such tasks will take considerable time, as hardware will be a limiting factor. Bostrom’s concept of a ‘hardware overhang’ amounts to thinking that AI researchers would be content to ‘leave money on the table’, in the sense of not making use of what hardware resources are available to them for extended periods of development. This is especially implausible in the event of groundbreaking research involving AI architectures showing substantial promise. Such systems would hardly be likely to spend years being developed on relatively primitive hardware only to be suddenly and very rapidly dramatically scaled up at the precise moment when practically no further development is necessary, and they are already effectively ready to achieve superhuman intelligence.
  • ‘One key insight’ argument: Bostrom argues that ‘if human level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level’. Assuming that ‘one key insight’ would be all it would take to crack the problem of superhuman intelligence is, to my mind, grossly implausible, and not consistent either with the slow but steady rate of progress in artificial intelligence research over the past 60 years, or with the immensely complex and multifaceted phenomenon that is human intelligence.

Additional positive arguments against the plausibility of a fast takeoff include the following:

  • Speed of science: Bostrom’s assertion that artificial intelligence research could develop from clearly sub-human to obviously super-human levels of intelligence in a matter of days or hours is simply absurd. Scientific and engineering projects simply do not work over timescales that short. Perhaps to some degree this could be altered in the future if (for example) human-level intelligence could be emulated on a computer and then the simulation run at much faster than real-time. But Bostrom’s argument is that machine intelligence is likely to precede emulation, and as such all we will have to work with at least up to the point of human/machine parity being reached is human levels of cognitive ability. As such it seems patently absurd to argue that developments of this magnitude could be made on the timespan of days or weeks. We simply see no examples of anything like this from history, and Bostrom cannot argue that the existence of superintelligence would make historical parallels irrelevant, since we are precisely talking about the development of superintelligence in the context of it not already being in existence.
  • Subsystems argument: any superintelligent agent will doubtlessly require many interacting and interconnected subsystems specialised for different tasks. This is the way even much narrower AIs work, and it is certainly how human cognition works. Ensuring that all these subsystems or processes interact efficiently, without one inappropriately dominating or slowing up overall cognition, or without bottlenecks of information transfer or decision making, is likely to be something that requires a great deal of experimentation and trial-and-error. This in turn will take extensive empirical experiments, tinkering, and much clever work. All this takes time.
  • Parallelisation problems: many algorithms cannot be sped up considerably by simply adding more computational power unless an efficient way can be found to parallelise them, meaning that they can be broken down into smaller steps which can be performed in parallel across many processors at once. This is much easier to do for some types of algorithms and computations than others. It is not at all clear that the key algorithms used by a superintelligence would be susceptible to parallelisation. Even if they were, developing efficient parallelised forms of the relevant algorithms would itself be a prolonged process. The superintelligence itself would only be able to help in this development to the degree permitted by its initially limited hardware endowment. We therefore would expect to observe gradual improvement of algorithmic efficiency in parallelisation, thereby enabling more hardware to be added, thereby enabling further refinements to the algorithms used, and so on. It is therefore not at all clear that a superintelligence could be rapidly augmented simply by ‘adding more hardware’.
  • Need for experimentation: even if a superintelligence came into existence quite rapidly, it would still not be able to achieve a decisive strategic advantage in similarly short time. This is because such an advantage would almost certainly require development of new technologies (at least the examples Bostrom gives almost invariably involve the AI using technologies currently unavailable to humans), which would in turn require scientific research. Scientific research is a complex activity that requires far more than skill at ‘prediction and means-end reasoning’. In particular, it also generally requires experimental research and (if engineering of new products is involved) producing and testing of prototypes. All of this will take time, and crucially is not susceptible to computational speedup, since the experiments would need to be performed with real physical systems (mechanical, biological, chemical, or even social). The idea that all (or even most) such testing and experimentation could be replaced by computer simulation of the relevant system is absurd, since most such simulations are completely computationally intractable, and likely to remain so for the foreseeable future (in many cases possibly forever). Therefore in the development of new technologies and scientific knowledge, the superintelligence is still fundamentally limited by the rate at which real-world tests and experiments can be performed.
  • The infrastructure problem: in addition to the issue of developing new technologies, there is the further problem of the infrastructure required to develop such technologies, or even just to carry out the core objectives of the superintelligence. In order to acquire a decisive strategic advantage, a superintelligence will require vast computational resources, energy sources to supply them, real-world maintenance of these facilities, sources of raw materials, and vast manufacturing centres to produce any physical manipulators or other devices it requires. If it needs humans to perform various tasks for it, it will likely also require training facilities and programs for its employees, as well as teams of lawyers to acquire all the needed permits and permissions, write up contracts, and lobby governments. All of this physical and social infrastructure cannot be built in the matter of an afternoon, and more realistically would take many years or even decades to put in place. No amount of superintelligence can overcome physical limitations of the time required to produce and transform large quantities of matter and energy into desired forms. One might argue that improved technology certainly can reduce the time taken to move matter and energy, but the point is that it can only do so after the technology has been embodied in physical forms. The superintelligence would not have access to such hypothetical super-advanced transportation, computation, or construction technologies until it had built the factories needed to produce the machine tools with are needed to precisely refine the raw materials needed for parts in the construction of the nanofactory… and so on for many other similar examples. Nor can even vast amounts of money and intelligence allow any agent to simply brush aside the impediments of the legal system and government bureaucracy in an afternoon. A superintelligence would not simply be able to ignore such social restrictions on its actions until after it had gained enough power to act in defiance of world governments, which it would not be able to do until it had already acquired considerable military capabilities. All of this would take considerable time, precluding a fast takeoff.

Premise 3: Arguments against cosmic expansion

Critical to Bostrom’s argument about the dangers of superintelligence is that a superintelligence with a critical strategic advantage would likely capture the majority of the cosmic endowment (the sum total of the resources available within the regions of space potentially accessible to humans). This is why Bostrom presents calculations for the huge numbers of potential human lives (or at least simulations of lives) whose happiness is at stake should the cosmic endowment be captured by a rogue AI. While Bostrom does present some compelling reasons for thinking that a superintelligence with a decisive strategic advantage would have reasons and the ability to expand throughout the universe, there are also powerful considerations against the plausibility of this outcome which he fails to consider.

First, by the orthogonality thesis, a superintelligent agent could have almost any imaginable goal. It follows that a wide range of goals are possible that are inconsistent with cosmic expansion. In particular, any superintelligence with goals involving the value of unspoiled nature, or of constraining its activities to the region of the solar system, or of economising on the use of resources, would have reasons not to pursue cosmic expansion. How likely it is that a superintelligence would be produced with such self-limiting goals compared to goals favouring limitless expansion is unclear, but it is certainly a relevant outcome to consider, especially given that valuing exclusively local outcomes or conservation of resources seem like plausible goals that might be incorporated by developers into a seed AI.

Second, on a number of occasions, Bostrom briefly mentions that a superintelligence would only be able to capture the entire cosmic endowment if no other technologically advanced civilizations, or artificial intelligences produced by such civilizations, existed to impede it. Nowhere, however, does he devote any serious consideration to how likely the existence of such civilizations or intelligences is. Given the great age and immense size of the cosmos, however, the probability that humans are the first technological civilization to achieve spaceflight, or that any superintelligence we produce would be the first to spread throughout the universe, seems infinitesimally small. Of course this is an area of great uncertainly and we can therefore only speculate about the relevant probabilities. Nevertheless, it seems very plausible to me that the chances of any human-produced superintelligence successfully capturing the cosmic endowment without alien competition are very low. Of course this does not mean that an out-of-control terrestrial AI could not do great harm to life on Earth and even spread throughout neighbouring stars, but it does significantly blunt the force of the huge numbers Bostrom presents as being at stake if we think the entire cosmic endowment is at risk of being misused.

Premise 4: The nature of AI motivation

Bostrom’s main argument in defence of premise 4 is that unless we are extremely careful and/or lucky in establishing the goals and motivations of the superintelligence before it captures the cosmic endowment, it is likely to end up pursuing goals that are not in alignment with our own values. Bostrom presents a number of thought experiments as illustrations of the difficulty of specifying values or goals in a manner that would result in the sorts of behaviours we want it to perform. Most of these examples involve the superintelligence pursuing a goal in a single-minded, literalistic way, which no human being would regard as ‘sensible’. He gives as examples an AI tasked with maximising its output of paperclips sending out probes to harvest all the energy within the universe to make more paperclips, or an AI tasked with increasing human happiness enslaving all humans and hijacking their brains to stimulate the pleasure centres directly. One major problem I have with all such examples is that the AIs always seem to lack a critical ability in interpreting and pursuing their goals that, for want of a better term, we might describe as ‘common sense’. This issue ultimately reduces to which conception of intelligence one applies, since if we adopt Intelligence(1) then any such AIs would necessarily have ‘common sense’ (this being a human cognitive ability), while the other two conceptions of intelligence would not necessarily include this ability. However, if we do take Intelligence(1) as our standard, then it seems difficult to see why a superintelligence would lack the sort of common sense by which any human would be able to see that the simple-minded, literalistic interpretations given as examples by Bostrom are patently absurd and ridiculous things to do.

Aside from the question of ‘common sense’, it is also necessary to analyse the concept of ‘motivation’, which is a multifaceted notion that can be understood in a variety of ways. Two particularly important conceptions of motivation are that it is some sort of internal drive to do or obtain some outcome, and motivation as some sort of more abstract rational consideration by which an agent has a reason to act in a certain way. Given what he says about the orthogonality thesis, it seems that Bostrom thinks of motivation as being some sort of internal drive to act in a particular way. In the first few pages of the chapter on the intelligent will, however, he switches from talking about motivation to talking about goals, without any discussion about the relationship between these two concepts. Indeed, it seems that these are quite different things, and can exist independently of each other. For example, humans can have goals (to quit smoking, or to exercise more) without necessarily having any motivation to take actions to achieve those goals. Conversely, humans can be motivated to do something without having any obvious associated goal. Many instances of collective behaviour in crowds and riots may be examples of this, where people act based on situational factors without any clear reason or objectives. Human drives such as curiously and novelty can also be highly motivating without necessarily having any particular goal associated with them. Given the plausibility that motivation and goals are different and distinct concepts, it is important for Bostrom to explain what he thinks the relationship between them is, and how they would operate in an artificial agent. This seems all the more relevant since we would readily say that many intelligent artificial systems possess goals (such as the common examples of a heat-seeking missile or a chess playing program), but it is not at all clear that these systems are in any way ‘motivated’ to perform these actions – they are simply designed to work towards these goals, and motivations simply don’t come into it. What then would it take to build an artificial agent that had both goals and motivations? How would an artificial agent act with respect to these goals and/or motivations? Bostrom simply cannot ignore these questions if he is to provide a compelling argument concerning what AIs would be motivated to do.

The problems inherent in Bostrom’s failure to analyse these concepts in sufficient detail become evident in the context of Bostrom’s discussion of something that he calls ‘final goals’. While he does not define these, presumably he means goals that are not pursued in order to achieve some further goal, but simply for their own sake. This raises several additional questions: can an agent have more than one final goal? Need they have any final goals at all? Might goals always be infinitely resolvable in terms of fulfilling some more fundamental or more abstract underlying goal? Or might multiple goals form an inter-connected self-sustaining network, such that all support each other but no single goal can be considered most fundamental or final? These questions might seem arcane, but addressing them is crucial for conducting a thorough and useful analysis of the likely behaviour of intelligent agents. Bostrom often speaks as if a superintelligence will necessarily act in single-minded devotion to achieve its one final goal. This assumes, however, that a superintelligence would be motivated to achieve its goal, that it would have one and only one final goal, and that its goal and its motivation to achieve it are totally independent from and not receptive to rational reflection or any other considerations. As I have argued here and previously, however, these are all quite problematic and dubious notions. In particular, as I noted in the discussion about the nature of intelligence, a human’s goals are subject to rational reflection and critique, and can be altered or rejected if they are determined to be irrational or incongruent with other goals, preferences, or knowledge that the person has. It therefore seems highly implausible that a superintelligence would hold so tenaciously to their goals, and pursue them so single-mindedly. Only a superintelligence possessing a much more minimal form of intelligence, such as the skills at prediction and means-ends reasoning of Intelligence(3), would be a plausible candidate for acting in such a myopic and mindless way. Yet as I argued previously, a superintelligence possessing only this much more limited form of intelligence would not be able to acquire all of the ‘cognitive superpowers’ necessary to establish a decisive strategic advantage.

Bostrom would likely contend that such reasoning is anthropomorphising, applying human experiences and examples in cases where they simply do not apply, given how different AIs could be to human beings. Yet how can we avoid anthropomorphising when we are using words like ‘motivation’, ‘goal’, and ‘will’, which acquire their meaning and usage largely through application to humans or other animals (as well as anthropomorphised supernatural agents)? If we insist on using human-centred concepts in our analysis, drawing anthropocentric analogies in our reasoning is unavoidable. This places Bostrom in a dilemma, as he wants to simultaneously affirm that AIs would possess motivations and goals, but also somehow shear these concepts of their anthropocentric basis, saying that they could work totally differently to how these concepts are applied in humans and other known agents. If these concepts work totally differently, then how are we justified in even using the same words in the two different cases? It seems that if this were so, Bostrom would need to stop using words like ‘goal’ and ‘motivation’ and instead start using some entirely different concept that would apply to artificial agents. On the other hand if these concepts work sufficiently similarly in human and AI cases to justify using common words to describe both cases, then there seems nothing obviously inappropriate in appealing to the operation of goals in humans in order to understand how they would operate in artificial agents. Perhaps one might contend that we do not really know whether artificial agents would have human analogues of desires and goals, or whether they would have something distinctively different. If this is the case, however, then our level of ignorance is even more profound than we had realised (since we don’t even know what words we can use to talk about the issue), and therefore much of Bostrom’s argument on these subjects would be grossly premature and under-theorised.

Bostrom also argues that once a superintelligence comes into being, it would resist any changes to its goals, since its current goals are (nearly always) better achieved by refraining from changing them to some other goal. There is an obvious flaw to this argument, namely that humans change their goals all the time, and indeed whole subdisciplines of philosophy are dedicated to pursuing the question of what we should value and how we should go about modifying our goals or pursuing different things to what we currently do. Humans can even change their ‘final goals’ (insomuch as any such things exist), such as when they convert religions or change between radically opposed political ideologies. Bostrom mentions this briefly but does not present any particularly convincing explanation for this phenomenon, nor does he explain why we should assume that this clear willingness to countenance (and even pursue) goal changes is not something that would affect AIs as it affects humans. One potential such response could be that the ‘final goal’ pursued by all humans is really something very basic such as ‘happiness’ or ‘wellbeing’ or ‘pleasure’, and that this never changes even though the means of achieving it can vary dramatically. I am not convinced by this analysis, since many people (religious and political ideologues being obvious example) seem motivated by causes to perform actions that cannot readily be regarded as contributing to their own happiness or wellbeing, unless these concepts are stretched to become implausibly broad. Even if we accept that people always act to promote their own happiness or wellbeing, however, it is certainly the case that they can dramatically change their beliefs about what sort of things will improve their happiness or wellbeing, thus effectively changing their goals. It is unclear to me why we should expect that a superintelligence able to reflect upon its goals could not similarly change its mind about the meaning of its goals, or dramatically alter its views on how to best achieve them.

Premise 5: The tractability of the AI alignment problem

Critical to the question of artificial intelligence research as a cause for effective altruists is the argument that there are things which can be done in the present to reduce the risk of misaligned AI attaining a critical strategic advantage. In particular, it is argued that AI safety research and work on the goal alignment problem has the potential of being able to, after the application of sufficient creativity and intelligence, significantly assist our efforts in constructing an AI which is ‘safe’, and has goals aligned with our best interests. This is often presented as quite an urgent matter, something which must be substantively ‘solved’ before a superintelligent AI comes into existence if catastrophe is to be averted. This possibility, however, seems grossly implausible considering the history of science and technology. I know of not a single example of any significant technological or scientific advance whose behaviour we have accurately been able to predict, and whose safety we have been able to ensure, before it has been developed. In all cases, new technologies are only understood gradually as they are developed and put to use in practise, and their problems and limitations progressively become evident.

In order to ensure that an artificial intelligence would be safe, we would first need to understand a great deal about how artificially intelligent agents work, how their motivations and goals are formed and evolve (if it all), and how artificially intelligent agents would behave in society in their interactions with humans. It seems to me that, to use Bostrom’s language, this constitutes an AI-complete problem, meaning that there is no realistic hope of substantively resolving these issues before human-level artificial intelligence itself is developed. To assert the contrary is to contend that we can understand how an artificial intelligence would work well enough to control it and wisely plan with respect to possible outcomes, before we actually know how to build one. It is to assert that a detailed knowledge about how the AI’s intellect, goals, drives, and beliefs would operate in a wide range of possible scenarios, and also the ability to control its behaviours and motivations in accordance with our values, would still not include essential knowledge needed to actually build such as AI. Yet what it is exactly that such knowledge would leave out? How could we know such much about AIs without being able to actually build one? This possibility seems deeply implausible, and not comparable to any past experiences in the history of technology.

Another major activity advocated by Bostrom is to attempt to alter the relative timing of different technological developments. This rests on the principle of what he calls differential technological development, that it is possible to retard the development of some technologies relative to the arrival time of others. In my view this principle is highly suspect. Throughout the history of science and technology the simultaneous discovery or development of new inventions or discoveries is not only extremely common, but appears to be the norm of how scientific research progresses rather than the exception (see ‘list of multiple discoveries’ on Wikipedia for examples of this). The preponderance of such simultaneous discoveries lends strong support to the notion that the relative arrival of different scientific and technological breakthroughs depends mostly upon the existing state of scientific knowledge and technology – that when a particular discovery or invention has the requisite groundwork to occur, then and only then will it occur. If on the other hand individual genius or funding initiatives were the major drivers of when particular developments occur, we would not expect the same special type of genius or the same sort of funding program to exist in multiple locations leading to the same discovery at the same time. The simultaneous discovery of so many new inventions or discoveries would under this explanation be an inexplicable coincidence. If discoveries come about shortly after all the necessary preconditions are available, however, then we would expect that multiple persons in different settings would take advantage of the common set of prerequisite conditions existing around the same time, leading to many simultaneous discoveries and developments.

If this analysis is correct, then it follows that the principle of differential technological development is unlikely to be applicable in practise. If the timing and order of discoveries and developments largely depends upon the necessary prerequisite discoveries and developments having been made, then simply devoting more resources to a particular emerging technology would do little to accelerate is maturation. These extra resources may help to some degree, but the major bottleneck on research is likely to be the development of the right set of prerequisite technologies and discoveries. Increased funding can increase the number of researchers, which in turn lead to a larger range of applications of existing techniques to slightly new uses and minor incremental improvements of existing tools and methods. Such activities, however, are distinct from the development of innovative new technologies and substantively new knowledge. These sorts of fundamental breakthroughs are essential for the development of major new branches of technology such as geoengineering, whole brain emulation, artificial intelligence, and nanotechnology. In this analysis is correct, however, they cannot simply be purchased with additional research money, but must await the development of essential prerequisite concepts and techniques. Nor can we simply devote research funding to the prerequisite areas, since these fields would in turn have their own set of prerequisite technologies and discoveries upon which they are dependent. In essence, science and technology is a strongly inter-dependent enterprise, and we can seldom predict what ideas or technologies will be needed for a particular future breakthrough to be possible. Increased funding for scientific research overall can potentially increase the general rate of scientific progress (though even this is somewhat unclear), but changing the relative order of arrival of different major new technologies is not something that we have any good reason to think is feasible. Any attempts therefore to strategically manipulate research funding or agendas to alter the relative order of arrival of nanotechnology, whole brain emulation, artificial intelligence, and other such technologies, are very unlikely to succeed.

Premises 6-7: The high expected value of AI research

Essential to the argument that we (society at large or the EA community specifically) should devote considerable resources to solving the AI alignment problem is the claim that even if the probability of actually solving the problem is very low, the size of the outcome in question (according to Bostrom, the entire cosmic endowment) is so large that its expected value still dominates most other possible causes. This also provides a ready riposte to all of my foregoing rebuttals of Bostrom’s argument – namely that even if each premise of Bostrom’s argument is very improbable, and even if as a result the conclusion is most implausible indeed, nevertheless the AI Doom Scenario outcome is so catastrophically terrible that in expectation it might still be worthwhile to focus much of our attention on trying to prevent it. Of course, at one level this is entirely an argument about the relative size of the numbers – just how implausible are the premises, and just how large would the cosmic endowment have to be in order to offset this? I do not believe it is possible to provide any non-question begging answers to this question, and so I will not attempt to provide any numbers here. I will simply note that even if we accept the logic of the expected value argument, it is still necessary to actually establish with some plausibility that the expected value is in fact very large, and not merely assume that it must be large because the hypothetical outcome is large. There are, however, more fundamental conceptual problems with the application of expected value reasoning to problems of this sort, problems which I believe weigh heavily against the validity of applying such reasoning to this issue.

First is a problem which is sometimes called Pascal’s mugging. It is based upon Blaise Pascal’s argument that (crudely put), one should convert to Christianity even if it is unlikely Christianity is true. The reason is that if God exists, then being a Christian will yield an arbitrarily large reward in heaven, while if God does not exist, there is no great downside to being a Christian. On the other hand, if God does exist, then not being a Christian will yield an arbitrarily large negative reward in hell. On the basis of the extreme magnitude of the possible outcomes, therefore, it is rational to become a Christian even if the probability of God existing is small. Whatever one thinks of this as a philosophical argument for belief in God, the problem with this line of argument is that it can be readily applied to a very wide range of possible claims. For instance, a similar case can be made for different religions, and even different forms of Christianity. A fringe apocalyptic cult member could claim that Cthulhu is about to awaken and will torture a trillion trillion souls for all eternity unless you donate your life savings to their cult, which will help to placate him. Clearly this person is not to be taken seriously, but unless we can assign exactly zero probability to his statement being false, there will always be some size negative outcome sufficiently bad as to make taking the action the rational thing to do.

The same argument could be applied in more plausible cases to argue that, for example, some environmental or social cause has the highest expected value, since if we do not act now to shape outcomes in the right way then Earth will become completely uninhabitable and thus mankind unable to spread throughout the galaxy. Or perhaps some neo-Fascist, Islamic fundamentalist, Communist revolutionary, anarcho-primitivist, or other such ideology could establish a hegemonic social and political system that locks humanity into a downward spiral that forever precludes cosmic expansion, unless we undertake appropriate political or social reforms to prevent this. Again, the point is not how plausible such scenarios are – though doubtless with sufficient time and imagination they could be made to sound somewhat plausible to those people with the right ideological predilections. Rather, the point is that in line with the idea of Pascal’s mugging, if the outcome is sufficiently bad, then the expected value of preventing the outcome could still be high in spite of a very low probability of the outcome occuring. If we accept this line of reasoning, we therefore find ourselves vulnerable to being ‘mugged’ by any kind of argument which posits an absurdly implausible speculative scenario, so long as it has a sufficiently large outcome. This possibility effectively constitutes a reductio ad absurdum for these type of very low probability, very high impact arguments.

The second major problem with applying expected value reasoning to this sort of problem is that it is not clear that the conceptual apparatus is properly aligned to the nature of human beliefs. Expected value theory holds that human beliefs can be assigned a probability which fully describes the degree of credence with which we hold that belief. Many philosophers have argued, however, that human beliefs cannot be adequately described this way. In particular, it is not clear that we can identify a single specific number that precisely describes our degree of credence in such amorphous, abstract propositions as those concerning the nature and likely trajectory of artificial intelligence. The possibilities of incomplete preferences, incomparable outcomes, and suspension of judgement are also very difficult to incorporate into standard expected value theory, which assumes complete preferences and that all outcomes are comparable. Finally, it is particularly unclear why we should expect or require that our degrees of credence should adhere to the axioms of standard probability theory. So-called ‘Dutch book arguments’ are sometimes used to demonstrate that sets of beliefs that do not accord with the axioms of probability theory are susceptible to betting strategies whereby the person in question would be guaranteed to lose money. Such arguments, however, only seem relevant to beliefs which are liable to be the subject of bets. For example, of what relevance is it whether one’s beliefs about the behaviour of a hypothetical superintelligent agent in the distant future are susceptible to Dutch book arguments, when the events in question are so far in the future that it is impossible that any enforceable bet could actually be made concerning them? Perhaps beliefs which violate the axioms of probability, though useless for betting, are valuable or justifiable for other purposes or in other domains. Much more has been written about these issues (see for example the Stanford Encyclopedia of Philosophy article on Imprecise Probabilities), however for our purposes it is sufficient to establish that powerful objections can and have been raised concerning the adequacy of expected value arguments, particularly in applications of low probability and high potential impact. These issues require careful consideration before premises 6 and 7 of the argument can be justified.


In concluding, I would just like to say a final word about the manner in which I believe AI safety is likely to present the greatest danger in the future. On the basis of the arguments I have presented above, I believe that the most dangerous AI risk scenario is not that of the paperclip maximiser or some out-of-control AI with a very simplistic goal. Such examples feature very prominently in Bostrom’s argument, but as I have said I do not find them very plausible. Rather, in my view the most dangerous scenario is one in which a much more sophisticated, broadly intelligent AI comes into being which, after some time interaction with the world, acquires a set of goals and motivations which we might broadly describe as those of a psychopath. Perhaps it would have little or no regard for human wellbeing, instead becoming obsessed with particular notions of ecological harmony, or cosmic order, or some abstracted notion of purity, or something else beyond our understanding. Whatever the details, the AI need not have an aversion to changing its ‘final goals’ (or indeed have any such things at all). Nor need it pursue a simple goal single-mindedly without stopping to reflect or being able to be persuaded by conversing with other intelligent agents. Nor need such an AI experience a very rapid ‘takeoff’, since I believe its goals and values could very plausibly alter considerably after its initial creation. Essentially all that is required would be a set of values substantially at odds with those of most or all of humanity. If it was sufficiently intelligent and capable, such an entity could cause considerable harm and disruption. In my view, therefore, AI safety research should focus not only on how to solve the problem of value learning or how to promote differential technological development. It should also focus on how the motivations of artificial agents develop, how these motivations interact with beliefs, and how they can change over time as a result of both internal and external forces. The manner in which an artificial agent would interact with existing human society is also an area which, in my view, warrants considerable further study, since the manner in which such interactions proceed plays a central role in many of Bostrom’s arguments.

Bostrom’s book has much to offer those interested in this topic, and although my critique has been almost exclusively negative, I do not wish to come across as implying that I think Bostrom’s book is not worth reading or presents no important ideas. My key contention is simply that Bostrom fails to provide compelling reasons to accept the key premises in the argument that he develops over the course of his book. It does not, of course, follow that the conclusion of his argument (that AI constitutes a major existential threat worthy of considerable effort and attention) is false, only that Bostrom has failed to establish its plausibility. That is, even if Bostrom’s argument is fallacious, it does not follow that AI safety is a completely spurious issue that should be ignored. On the contrary, I believe it is an important issue that deserves more attention in mainstream society and policy. At the same time, I also believe that relative to other issues, AI safety receives too much attention in EA circles. Fully defending this view would require additional arguments beyond the scope of this article. Nevertheless, I hope this piece contributes to the debate surrounding AI and its likely impact in the near future.

Massive Content Update!

Hi everyone, just wanted to announce the release of a huge amount of new content on my blog. You can see the links to the pages along the top menu bar. I have uploaded a bunch of old projects that I produced in previous years, including a science fiction novella and some presentations about world history and the solar system. I’ve also updated about 50 slide show presentations on topics ranging from philosophy, science, and statistics for talks that I’ve given over the years. Finally, I’ve uploaded close to 100 pdfs of my typed up notes for university courses that I’ve taken or attended over the years, which are organised by category. Hopefully people will find things here of interest and use to them, so check it out!

Levels of Scepticism: How Even Rational People have Sceptical Blind Spots

Most of my readers doubtless recognise the importance of being skeptical about the information, arguments, and ideas that we encounter, be it dietary advice, political opinions, science news articles, or whatever else. There are, however, different levels of scepticism, corresponding to the varying degrees of sophistication we can attain in the manner in which we respond to new ideas or arguments. In this piece I wish to outline a brief topology of these different levels of scepticism. I do not pretend to offer any sort of definitive classification, nor do I claim that these levels are in any way based upon empirical psychological research. Their purpose rather is to serve as a conceptual tool to help us think about the ways in which we can improve our own thinking, and work to eliminate residual biases and blind spots that hamper our efforts to form beliefs that are best justified by strong argument and quality evidence. The hierarchy that I shall outline has four levels, ranging from least sceptical at level 0 to most sceptical at level 3. I want to emphasise that the purpose of these levels is not to create a ranking of particular people as better or worse sceptics, as most people operate at multiple different levels depending on the circumstance and the topic in question. Rather, the purpose is to rank particular types of thinking, so that we may better recognise when we are thinking in a better or worse mode of scepticism.

I will begin my discussion at the bottom of the hierarchy, level 0. When we think at this level, we do not think particularly critically or sceptically about much of anything. Though we may have opinions about various matters of political, ethical, or philosophical import, when operating at level 0 we are typically unable to clearly articulate these views to others, or explain why we hold them. Most such views are typically informed primarily by our upbringing, socialisation, and the attitudes of the people around them as they go about their lives. Many people who operate at this level have little to no ability to critically analyse evidence or analyse an abstract logical argument, having never been taught such skills or found it necessary to learn them. Even those who do have such skills, however, can sometimes be remarkably compartmentalised in the manner in which they apply them, for example being able to hold forth a detailed analytical argument about topic A, but when discussing topic B doing little more than spouting catch-phrases that resonate with them. When we operate at level 0, we tend to think that our viewpoint is ‘obvious’, and react with surprise when we find that others think differently, or that any sensible person can hold a different view. It is likely that the majority of humanity operate at level 0 most of the time, as this is the type of thinking that comes most naturally and easily to most humans. That is, we typically form beliefs about the world not on the basis of careful examination of evidence, logical analysis, or in-depth comparison of alternative perspectives, but unconsciously and reflexively as we go about our lives, drawing largely upon what we know and are familiar with. I do not want to claim that this is inappropriate in all contexts, as certainly we cannot always subject everything to detailed critical analysis. However, I do think that making a habit of thinking in this way is liable to lead us into error and confusion about a great many of our beliefs. Scepticism, logic, and science are valuable tools, and neglecting these tools leaves us intellectually impoverished and prone to biased and mistaken reasoning.

This leads me on to the next level of the scepticism hierarchy, level 1. When operating at this level, we are able to articulate clear opinions on a variety of subjects, martialling various arguments and evidences in favour of our views. We recognise the distinctiveness of different viewpoints and are able to employ the tools of scepticism and rationality to make arguments for what we regard as the correct view. However, when thinking at this level we also tend to identify strongly with one particular perspective, be it religious, political, scientific, or whatever else, and employ these sceptical tools selectively against arguments or information coming from the opposing ‘side’. We are able to spot logical fallacies, faulty reasoning, and inadequate evidence in the arguments of our ideological opponents, but are much less able to apply the same skills to arguments made by those of their own ideological persuasion. When operating at level 1, we tend to respond to new claims by ‘pattern matching’ how the claim is framed and who is making it, and on that basis classify it as ‘for’ or ‘against’ our side. We thus do not judge arguments fairly on their own merits, but subject them to an initial, largely unconscious ‘screening process’, whereby if an argument ‘sounds like’ the sort of thing someone we disagree with would say, then we subject it to closer skeptical examination. On the other hand, if it sounds like the sort of thing somebody who agrees us would say, then it typically avoids any in-depth examination. This sort of self-serving, pro in-group bias comes very naturally to humans, and thus is very difficult to overcome. It is also very difficult to notice in ourselves, because when operating at level 1 we typically are only conscious of the times when we are being skeptical and critical, not the times when we aren’t. To us it feels like we take arguments only on their merit, when in reality we are very selective about how our scepticism is applied, and make little effort to subject views that accord with our beliefs or biases to the same rigorous critical examination that we apply to those that do not. When operating at level 1 we are also liable to be misled by framing effects, slogans, buzzwords, and other irrelevancies relating not to the substance of an argument, but to how it is packaged. Selective scepticism of this sort is very common to those heavily involved in some sort of social movement or organisation, and is not always bad because it can save us time – after all, we can’t critically examine every single claim we come across. At the same time, it can become all too easy to become accustomed to operating at this level, and in doing so we fail to make proper or full use of the tools of rationality and scepticism.

When operating at the next level up in the hierarchy, level 2, we are able to apply critical thinking skills and skeptical analysis consistently and fairly both to arguments that we find agreeable and those that we find disagreeable. We allow the arguments and evidence to be persuasive in their own right, with minimal influence based on who has made them, or how they have been formulated. We consciously recognise our tendencies to favour ‘our side’ over the ‘other side’, and make efforts to circumvent this by deliberately taking time to critique arguments made by those who agree with us, and likewise by finding the strongest, most able defenders of ideas we disagree with. This, of course, is not easy to do, and requires careful attention and genuine effort to fairly engage with different perspectives and ideas. There is, however, one significant failing that we still commonly experience when operating at level 2. Namely, we instinctively and reflexively retain an unreasonable overconfidence in our own reasoning abilities. We tend to believe that our perspectives or conclusions on some issue are the ‘right’ ones, and everyone else has got it ‘wrong’. Taken to extremes, this type of thinking can lead to habitual contrarianism and even conspiratorial thinking. In such cases, we may think that both sides of some major dispute have it wrong, and we are the ‘lone genius’ able to see the correct answer. While most people do not reach such extremes, what those operating at level 2 have in common is their inability or unwillingness to apply the same sceptical attitude and critical examination to their own thought processes that they do to the arguments of others. We thus do not properly appreciate the many limitations of memory, rationality, and knowledge that we ourselves are subject to, and which hamper our efforts to draw correct conclusions. We are skeptical of everyone else, but not sufficiently skeptical of themselves, of our own biases and limitations.

The highest level of my hierarchy is level 3, and it is the level I believe we should all aspire to use as regularly as possible. When operating at level 3, we properly apply scepticism and critical analysis not only to everyone else, but also to ourselves and our own beliefs, preconceptions, and thought processes. We are often hesitant to attach strong credence to the conclusions we reach, because we know that our rationality is grossly imperfect and our knowledge and perspectives sorely limited. This of course should not lead us to radical scepticism or keep us from forming opinions about anything, but it should temper our confidence considerably and keep us from becoming dogmatically attached to our conclusions and perspectives. In level 3 we are also much more self-critical, actively setting out to uncover our own biases and doing our best to compensate for them, and not just criticising the biases and errors of others. Likewise, we actively seek out the viewpoints of other informed persons to critique our opinions and point out our cognitive ‘blind spots’, helping us to apply scepticism to our own thought processes and reasoning. Level 3 is often an uncomfortable state to operate in, for it robs us of the overconfidence in our beliefs that is reassuring to most people, and also requires a degree of active self-criticism which is unnatural and effortful to maintain. We also must also make an effort to find the right balance between appropriate self-criticism and scepticism on the one hand, and paralysing self-doubt, apathy, or total mistrust of reason on the other. Operating at level 3 is neither easy nor natural, but I do believe it is the highest form of ‘true scepticism’, and the ideal to which we should all aspire. Operating in this level may not always be possible, but nevertheless is worth striving for since it allows us to take the fullest advantage of the tools afforded by logic, rationality, and scepticism, thereby providing us maximum chances for ultimately forming accurate beliefs free from error, bias, and distortion.