One of the cleanest ideas in modern cosmology is also one of the easiest to overlook. According to the standard ΛCDM model, structure in the Universe forms hierarchically, with dark matter collapsing under gravity into bound halos over an enormous range of masses. Massive halos, capable of hosting large galaxies or clusters of galaxies, are comparatively rare. As one moves to lower and lower masses, however, the number of halos increases rapidly. In fact, the theoretical abundance of halos rises so steeply toward small masses that low-mass halos should dominate the cosmic population by sheer numbers. Taken at face value, this has a striking and somewhat unsettling implication. If every dark matter halo were to host a visible galaxy, the night sky would look very different from what we observe. Instead of a relatively sparse distribution of galaxies, we would expect to see an overwhelming number of faint systems, tracing the enormous population of small halos predicted by theory. The absence of such a population in observational surveys is not a minor detail. It is a central clue that tells us something fundamental about how galaxy formation works, and, just as importantly, about how it fails. The natural conclusion is that most dark matter halos do not become galaxies in any ordinary sense of the word. They either never manage to form stars, or they form so few that their stellar populations are effectively invisible with current observational techniques. In this way, the prediction that the Universe should contain far more halos than galaxies is not a problem to be fixed, but a feature to be understood. It points toward a vast, largely unseen population of dark matter structures, quietly shaping the cosmic web without ever lighting up.
This conclusion is not controversial, and it is not new. What makes it interesting is that it forces us to confront an uncomfortable mismatch between theory and observation. We see galaxies, not halos. If the theoretical prediction is right, then most halos must fail to produce anything that looks like a galaxy. They either form no stars at all, or form so few that their light is effectively undetectable. In that sense, the existence of dark halos is not a speculative idea; it is almost required by the success of the model.
The real difficulty is empirical. A halo without stars is, by design, hard to see. Dark matter emits no light, and a system that lacks stars does not glow in the usual optical or infrared bands. If such objects exist, we should not expect them to announce themselves clearly. Instead, they must be found indirectly, through whatever faint traces of ordinary matter they manage to retain. This is where the physics of reionization enters the story. When the Universe was reionized, the intergalactic medium was heated by ultraviolet radiation to temperatures of order ten thousand kelvin. Gas at these temperatures develops significant pressure support, and only sufficiently deep gravitational potential wells can confine it for long periods of time. The result is a kind of mass threshold for galaxy formation. Above this threshold, halos can hold onto gas, allow it to cool, and eventually form stars. Below it, gas is easily lost or remains too warm and diffuse to collapse.
It is useful to summarize this complicated physics with a single number: a critical halo mass $M_{\mathrm{crit}}$. At the present epoch, this scale is around $10^{9.7} M_\odot$. The precise value is not especially important for what follows. What matters is that galaxy formation becomes sharply inefficient near this mass. A small change in halo mass, or in the details of its assembly history, can mean the difference between forming a faint dwarf galaxy and forming no stars at all. This sharp transition opens up an intriguing possibility. Consider halos that lie just below the critical mass. They are not massive enough to form stars today, but they may still be massive enough to retain some of their gas. In such systems, the gas does not collapse into a rotating disk or fragment into stars. Instead, it can settle into a relatively simple configuration, supported by pressure rather than rotation, and held in place by the dark matter potential. In this regime, the behavior of the gas is governed by equilibrium rather than by violent astrophysical processes. The temperature is regulated by a balance between photoheating from the ultraviolet background and radiative cooling. Gravity tries to compress the gas inward, while pressure pushes back. When these effects balance, the gas settles into a quasi-static state, roughly spherical, dynamically cold, and largely free of the complications associated with star formation and feedback. Objects of this type have come to be known as reionization-limited H I clouds. The name is less important than the idea behind it. These systems are not arbitrary oddities, but a natural prediction of the ΛCDM framework combined with reionization physics. They are expected to be rare, because they occupy a narrow window in halo mass, but they are also expected to have distinctive observational signatures, particularly in neutral hydrogen.
From a theoretical point of view, such objects are unusually attractive. Ordinary galaxies are messy. Their gas and stars are shaped by cooling, star formation, feedback, and environmental effects, all of which complicate any attempt to infer the underlying dark matter distribution. A gas-rich but starless halo, by contrast, is closer to a clean physics problem. If one could identify such a system and measure its gas properties, one would have a direct window into the structure of a low-mass dark matter halo, largely uncontaminated by the usual astrophysical uncertainties. The question, then, is not whether such objects are allowed by theory. It is whether any of them can be found in the real Universe. With this theoretical picture in mind, it is natural to ask whether any concrete example of such a system has actually been observed. A particularly intriguing candidate has emerged in the vicinity of the nearby spiral galaxy M94: a compact neutral hydrogen cloud commonly referred to as Cloud-9. The object was first identified in 21 cm emission and subsequently confirmed with higher-resolution radio observations. What immediately sets Cloud-9 apart is that it looks like a coherent, self-contained system in H I, yet it does not obviously resemble a conventional gas-rich dwarf galaxy. One of the key observational facts is kinematic. Cloud-9 has a recession velocity of approximately $v \simeq 304\,\mathrm{km\,s^{-1}}$, essentially identical to that of M94. This makes a chance alignment with a foreground Milky Way high-velocity cloud unlikely, and strongly suggests that Cloud-9 lies at roughly the same distance as M94, about $D \simeq 4.4\,\mathrm{Mpc}$. Interpreted at this distance, Cloud-9 is compact, with an angular extent of order an arcminute, corresponding to a physical scale of roughly a kiloparsec.
The velocity structure of the neutral hydrogen is equally important. The observed 21 cm line profile is narrow, with a reported width of $W_{50} \approx 12\,\mathrm{km\,s^{-1}}$. Such a small line width immediately distinguishes Cloud-9 from most gas-rich dwarf galaxies, which typically show broader profiles due to rotation or turbulence. Here there is no clear evidence for a rotating disk. Instead, the kinematics are consistent with a dynamically cold, pressure-supported gas cloud. From the total integrated 21 cm flux, one can estimate the neutral hydrogen mass using the standard relation
$$
M_{\mathrm{HI}} = 2.36\times 10^5\, D^2 \int S(v)\,dv \, M_\odot,
$$
where $D$ is the distance in megaparsecs and $\int S(v)\,dv$ is the velocity-integrated flux in $\mathrm{Jy\,km\,s^{-1}}$. Substituting the observed values and adopting the distance of M94 yields
$$
M_{\mathrm{HI}} \sim 10^6\, M_\odot.
$$
This places Cloud-9 squarely in the regime of low-mass, gas-rich systems, comparable in H I content to some of the faintest known dwarf galaxies. At this point, one might reasonably wonder whether Cloud-9 could simply be an extreme but otherwise ordinary dwarf galaxy. However, modeling the gas as a pressure-supported system reveals a further puzzle. If the observed neutral hydrogen were the only source of gravity, the cloud would not be able to confine itself. The internal motions implied by the line width would cause the gas to disperse on relatively short timescales. The fact that Cloud-9 appears compact and long-lived implies the presence of an additional gravitational component. Interpreting this missing mass as dark matter leads to a striking inference. Simple equilibrium models, in which the gas sits in hydrostatic balance within a dark matter potential, suggest a total halo mass of order
$$
M_{\mathrm{halo}} \sim 5\times 10^9\, M_\odot.
$$
This value is not arbitrary. It lies remarkably close to the critical mass scale associated with reionization and the suppression of galaxy formation. In other words, Cloud-9 appears to sit precisely where theory predicts the transition between halos that form stars and halos that do not. If this interpretation is correct, then Cloud-9 is not merely an odd cloud of gas. It is a potential example of a dark matter halo whose mass is large enough to retain neutral hydrogen, yet small enough to have largely failed at forming stars. This makes it an unusually direct and concrete realization of the ideas discussed earlier, and immediately raises the most important question of all: if Cloud-9 really inhabits such a halo, where are its stars?
At this point the discussion becomes a detection problem in the literal statistical sense. Let us fix, as a working hypothesis, that Cloud-9 sits at the distance of M94, and consider a putative stellar component with total stellar mass $M_\star$. The observational data consist of a set of detected point sources in a small region on the sky centered near the H I maximum, together with their magnitudes in two bands (so that each source corresponds to a point in a color–magnitude diagram). The question is: for a given $M_\star$, what is the probability that a dataset of this depth would produce no statistically significant stellar overdensity at the Cloud-9 position?
To turn this into something quantitative, one needs three ingredients.
First, a model for the underlying stellar population. Concretely, one chooses an isochrone family and an initial mass function, and thereby obtains a distribution of intrinsic stellar luminosities in the observed filters, conditional on an assumed age and metallicity. In the most conservative case for detection, one takes an old, metal-poor population (for instance, age $\sim 10\,\mathrm{Gyr}$ and ${\rm [Fe/H]}\sim -2$), because younger populations would produce brighter, more easily detected stars for the same $M_\star$.
Second, one needs a model of the observational selection function. This consists of a completeness function $c(m)$ and an error model for the measured magnitudes, both of which can be calibrated by artificial-star injection and recovery tests. One may think of the selection function as defining, for any intrinsic magnitude $m$, a detection probability $c(m)\in[0,1]$ and a conditional distribution for the observed magnitude given that the star is detected.
Third, one needs a background model: even if Cloud-9 has no stars, the chosen sky region will contain some number of contaminating sources (foreground stars, unresolved background galaxies, and substructure within galaxies) that pass the point-source and quality cuts. The key point is that this background is measurable from control regions, so it can be treated as an empirically determined nuisance distribution rather than an arbitrary theoretical prior.
Once these ingredients are in place, the inference can be phrased in a way that is reasonably close to a standard hypothesis test. Fix a spatial aperture $A$ (for example, a circle of radius $r$ centered at the H I peak), and define a test statistic $N$ to be the number of detected sources within $A$ that survive the photometric quality cuts. For the purposes of obtaining a conservative upper limit, it is often enough to work with $N$ rather than with the full two-dimensional CMD distribution, because a genuine stellar population would typically increase $N$ as well as concentrate sources along the expected RGB locus.
Let $N_{\rm obs}$ be the observed value of this statistic in the Cloud-9 aperture. Let $B$ be the random variable representing the background counts in such an aperture, estimated by placing many apertures of the same size in control regions. Finally, let $S(M_\star)$ be the random variable representing the number of detected stars contributed by a stellar population of total mass $M_\star$ after applying completeness and photometric uncertainties. Then, under the hypothesis that Cloud-9 hosts a stellar population of mass $M_\star$, the total detected count is
$$
N(M_\star) \;=\; B \;+\; S(M_\star),
$$
where $B$ and $S(M_\star)$ are (to a good approximation) independent. The object of interest is then the tail probability
$$
p(M_\star) \;=\; \mathbb{P}!\left( N(M_\star) \le N_{\rm obs} \right),
$$
namely the probability that one would observe a count no larger than $N_{\rm obs}$ if the true stellar mass were $M_\star$. If $p(M_\star)$ is very small, then $M_\star$ is inconsistent with the data at high confidence. This is the mathematically cleanest way to state the problem: we are trying to find the largest $M_\star$ for which the observation remains plausible once one accounts for both observational incompleteness and background contamination.
There is a subtlety here that is easy to miss if one thinks only in terms of integrated light. For small stellar masses, the number of luminous tracer stars (such as RGB stars above a given magnitude limit) is a small integer, and therefore highly stochastic. Two stellar systems with the same $M_\star$ can produce noticeably different numbers of detectable bright stars simply because of Poisson and IMF sampling fluctuations. In the notation above, this is precisely the statement that $S(M_\star)$ is not well approximated by its mean. One really must treat $S(M_\star)$ as a full distribution, typically obtained by Monte Carlo sampling of the stellar population followed by application of the selection function. With this probabilistic framing, the “where are the stars?” question becomes sharply posed: determine the range of $M_\star$ for which $p(M_\star)$ remains non-negligible. If even $M_\star \sim 10^4 M_\odot$ yields $p(M_\star)\ll 1$ after properly accounting for background and incompleteness, then Cloud-9 cannot plausibly hide a Leo T–like stellar component. Conversely, if the data only rule out $M_\star \gtrsim 10^5 M_\odot$, then the object could still be an ultra-faint dwarf in disguise. The rest of the analysis is, essentially, an implementation of this inequality with realistic inputs.
Let us now connect the abstract tail probability
$$
p(M_\star)=\mathbb{P}!\left(B+S(M_\star)\le N_{\rm obs}\right)
$$
to what is actually measured in the Cloud-9 field. Fix an aperture $A$ consisting of a circle of radius $r=8.4”$, chosen because it corresponds to the effective radius of a Leo T analog placed at the distance of M94. Within this aperture, one can count the number of detected sources that survive a strict set of photometric quality cuts. Denote by $N_{\rm obs}$ the observed count in the Cloud-9 aperture. The raw observed number is $3$, but the H I centroid has a positional uncertainty comparable to the aperture size, so one should not treat the aperture center as exact. If one shifts the aperture center over the allowed centroid uncertainty and repeats the count, one obtains an empirical distribution of $N_{\rm obs}$ values with mean approximately $3.5$ and a dispersion of about $1$.
Next, one needs a background model. Define $B$ to be the random variable describing the number of contaminating sources per aperture. Rather than postulating a parametric form for $B$, one can estimate it empirically by placing a large number of apertures of identical size on a control region of the same dataset, processed with the same photometric pipeline and quality cuts. Operationally, this yields a background count distribution with mean approximately $3.7$ and dispersion about $2$ per aperture. The point is not the exact numbers, but the fact that the background level is measured directly and is comparable to the on-target count.
The relevant quantity is therefore not $N_{\rm obs}$ by itself, but the excess count
$$
\Delta \equiv N_{\rm obs}-B.
$$
At the Cloud-9 location, using the shifted-aperture procedure for $N_{\rm obs}$ and the control-aperture procedure for $B$, the inferred excess is
$$
\Delta \approx -0.2 \pm 2.2,
$$
which is consistent with $\Delta=0$ and, in particular, provides no evidence for a positive overdensity of point sources at the Cloud-9 position. Interpreted probabilistically, this means that any allowed stellar population must be one whose detectable contribution $S(M_\star)$ is typically of order a few stars or less, and even that only in the high tail of its stochastic distribution.
Now we incorporate the forward model for the stellar population. For each candidate stellar mass $M_\star$, one generates many Monte Carlo realizations of a stellar population (e.g., an old, metal-poor population), converts to the observed bands, and then applies the empirically measured selection function (completeness and photometric scatter) to obtain the induced distribution of the detected-star count $S(M_\star)$. Crucially, because we are in the low-mass regime where the number of bright tracer stars is a small integer, the distribution of $S(M_\star)$ must be treated directly; it is not well described by its mean alone.
With these pieces in hand, the test becomes sharp. Consider the hypothesis $H(M_\star)$ that Cloud-9 hosts a stellar population of mass $M_\star$ within the aperture. Under $H(M_\star)$, the observed count is modeled as $N=B+S(M_\star)$, and one asks whether the realized $N$ is unusually small compared to what $H(M_\star)$ predicts. For a concrete example that is astrophysically meaningful, take $M_\star=10^4\,M_\odot$. Under this hypothesis, the Monte Carlo population synthesis plus selection function yields the following strong statement: in $99.5\%$ of realizations, at least one detectable star is recovered in the aperture. Equivalently,
$$
\mathbb{P}!\left(S(10^4 M_\odot)\ge 1\right)=0.995,
\quad\text{so}\quad
\mathbb{P}!\left(S(10^4 M_\odot)=0\right)=0.005.
$$
If one were in a zero-background world, the conclusion would already be immediate: a non-detection at the Cloud-9 position would exclude $M_\star=10^4 M_\odot$ at $99.5\%$ confidence.
But we are not in a zero-background world, and that is exactly why the excess variable $\Delta$ matters. The background count is not only nonzero, it is comparable to the observed count. The correct question is therefore: can the background fluctuations plausibly mask the additional stars predicted by $M_\star=10^4 M_\odot$? The excess estimate $\Delta=-0.2\pm 2.2$ implies that, even allowing for statistical uncertainty, the maximal plausible positive overdensity in the aperture is only a small integer. If one takes a deliberately conservative upper excursion consistent with this uncertainty, one obtains an upper bound of roughly $\Delta_{\max}\simeq 2$ stars attributable to a real counterpart. Thus, a stringent (and conservative) way to state consistency is
$$
S(M_\star)\le 2,
$$
because any model that would typically produce three or more detectable stars above background would tend to generate a positive excess inconsistent with what is observed.
Under $M_\star=10^4 M_\odot$, the Monte Carlo distribution for $S(M_\star)$ places most probability mass above this conservative threshold. Concretely, only about $8.7\%$ of realizations yield $S(10^4M_\odot)\le 2$, so
$$
\mathbb{P}!\left(S(10^4M_\odot)\le 2\right)\approx 0.087.
$$
This means that even after giving the model the benefit of (i) centroid uncertainty, (ii) empirically measured background fluctuations, and (iii) a conservative tolerance for a small positive excess, a $10^4M_\odot$ stellar population remains strongly disfavored. One can summarize this as an exclusion at approximately the $1-0.087\approx 91.3\%$ level under the most conservative excess allowance, and at the $99.5\%$ level at the nominal center where the effective excess is consistent with zero and even negative.
At this stage it is also important to explain why these choices are conservative rather than aggressive. The assumed stellar population is taken to be old and metal-poor, which minimizes the number of bright, easily detected stars for a given $M_\star$. Any younger or intermediate-age component would increase detectability and therefore strengthen the exclusion. Similarly, the use of strict quality cuts and an empirically calibrated completeness function protects against over-claiming detections, but also means that some genuine stars (if present) would be lost by the pipeline, again making the inferred upper limit conservative. Finally, the background is not modeled by a convenient distribution chosen to yield a strong result; it is measured directly from the same dataset, meaning the comparison is intrinsically like-for-like.
Putting the argument in its cleanest form, the data constrain the stellar mass to be so low that the expected number of detectable RGB stars is at most of order unity. In practice, the forward modeling indicates that the largest stellar mass compatible with producing on average no more than one detectable RGB star, after incompleteness and photometric scatter, is approximately
$$
M_\star \lesssim 10^{3.5}\,M_\odot.
$$
This is far below the stellar masses of canonical gas-rich dwarfs with similar neutral hydrogen masses, and it is precisely the sort of bound one would want if the goal is to distinguish “a faint dwarf galaxy” from “a gas-bearing halo that largely failed to form stars.”
In short, once one phrases the problem as a hypothesis test for $M_\star$ using an empirically calibrated selection function and background distribution, the result is not merely “we did not see a galaxy.” The result is a quantitative inequality: any stellar counterpart must be so small that, in a dataset capable of resolving red giant branch stars at M94’s distance, the induced detectable-star count is forced into the small-integer regime. That is the sense in which Cloud-9 behaves, statistically, like a starless system. oai_citation:0‡2508.20157.pdf