Searching for extraterrestrial intelligence in our own genome

What?

In 2017, after narrowly missing out on a job at a genomic research company, a curious idea popped into my mind: what if someone had left us a signal in our own genome? I assume that the idea was a remnant of my favorite movie, 2001: A Space Odyssey, which (spoiler alert!) posits an alien intelligence modifying early humans to make them smarter, and then leaving us a signal that we could not discover until we managed to colonize the Moon (predicted, in that year before the first Moon landing, to happen by 2001).

Sally and I had never watched Ancient Aliens at that time, and I wasn’t even aware of a 2013 paper that had claimed to find an intelligent signal in our genome. It just seemed like an interesting idea.

I decided that the logic of some of the early SETI searches should also apply to this hypothesis: the simplest and most unambiguous way for an alien intelligence to leave us such a signal would be for them to encode the universal constant π into our genome somewhere, mapping each of the four bases to one of the four dibits (two-bit integers). There are 24 possible was to perform this mapping, and you can read the genome in either direction, so there are 48 possible mappings to check.

I wrote some rudimentary programs to search for π in the reference human genome, and satisfied myself that the results were no more statistically significant than they would be for searching for a random string of bits rather than π. A nice idea, but no cigar.

In 2024, for reasons we have not yet divulged publicly, I was again thinking about this idea. I decided that my 2017 search was indicative, but not really comprehensive, for two reasons. First, by searching for as many successive bits of π as I could, I was not taking into account the possibility that some of the bases may have been corrupted in the reference human genome, either through mutation or due to the fact that the reference genome is in some sense an “average” over all humans, which could be particularly problematical if the signal were contained in the “junk” (now called “non-coding”) parts of the genome. Second, I hadn’t even considered the real possibility that such an alien intelligence might have stripped off the leading 3 of π and just encoded the fractional part, which I would have missed completely.

I decided to relaunch this personal project and fix both of these flaws. To this end, in September 2024 Sally and I each purchased 100X Whole Genome Sequencing from Nebula Genomics, so that I could run my search against relatively accurate genomic sequences for at least two particular humans, rather than just on the reference genome. We received our data about six weeks later. I also decided to figure out a way to search for π that would be robust against a moderate number of leading bits being missing, and robust against some of the bits being corrupted by substitution mutations, but would still allow me to make an analysis of the likelihood that the results were statistically significant.

This project sat on the backburner until I had some time to kick it off in April 2025, after Meta laid me off. Then, in June 2025, Sally and I happened to watch the 2018 episode of Ancient Aliens where the 2013 paper was discussed. In that instant I realized that my backburner project was not just a personal academic curiosity, but could in fact provide an important test for an open and contentious scientific claim. I decided to immediately ramp up the project, and to release my results publicly, no matter what I found. I therefore wrote up almost all of the paper below, and almost all of this page, before actually performing the experiment.

To cut to the chase: I again found no evidence for such a signal in the human genome.

Paper

I had nearly finished writing up the paper below, but on August 17 I decided that I needed to take a short diversion from refactoring my code (I realized I needed to exclude the large blocks of N codes in the reference genome), and so I started using the same ideas to look at the raw data itself. That ended up taking all my attention. I therefore put up the incomplete version of this paper on September 8, to give some sort of coherent explanation for the predecessor of that other project.

Code

I performed the above experiments using simple code that I wrote in ANSI C. Again, I was refactoring that code when the other project took over. I therefore put up the code repository here on September 8 with the caveat that the code doesn’t actually work right now:

There are 13 programs included that probably don’t work right now:

ref_fasta_to_genome_file
Write the unambiguous bases from the reference human genome to my own GenomeFile format created just for this experiment.
nebula_fastq_to_genome_file
Write the unambiguous bases from a pair of Nebula files to my GenomeFile format.
ref_fasta_dump
Dump the unambiguous bases from the reference human genome into a text file to compare with that from the GenomeFile.
nebula_fastq_dump
Dump the unambiguous bases in a pair of Nebula files in the same way.
genome_file_dump
Dump the bases in one of my GenomeFile files in the same way.
genome_file_metadata
Print the metadata stored at the start of a GenomeFile, and optionally the number of bases in each fragment.
genome_file_head
Like the head utility, create a subset of a given GenomeFile that contains just the first n bases.
throw_needles_at_genome
Throw random needles at a GenomeFile to determine its statistical properties. Run it multiple times to increase coverage.
genome_stats
Compute some summary statistics from the results of throw_needles_at_genome and write them to a CSV file.
random_needle
Generate a random 64-bit needle, for test purposes.
cheat_needle
“Cheat” by choosing a 64-bit needle from within a GenomeFile from a random fragment, base, mapping, and direction.
find_needle_in_genome
Search for a specific needle in a GenomeFile.
needle_significance
Join the results of find_needle_in_genome to those of throw_needles_at_genome to assess statistical significance.

There are also 91 executables of unit tests and death tests provided.

Instructions for building the code are contained in the _README file within the archive.

Disclaimers

This page describes personal hobby research that I have undertaken since 2017. All opinions expressed herein are mine alone. All code provided here is from my personal codebase, and is supplied under the MIT-0 License.