LOGISTAR
HyenaDNA: A Large Language Model for the Human Genome
data: 2023-10-09

 

AI researchers from Stanford and Turing award winner Yoshua Bengio have trained a large language model on human genome data to better predict DNA profiles.

Researchers created HyenaDNA by combining the Hyena large language model and pre-training it on human reference genomic sequences of up to one million tokens. Prior models have typically used context lengths of 512 to 4,000 token, or less than 0.001% of the human genome.

Stanford’s researchers contend that most of the work on long context models have focused on natural language and code and that biology is “inherently made of ultralong sequences.”

https://aibusiness.com/ml/hyenadna-a-large-language-model-trained-on-human-genome-sequences