Introduction
An Expandable End-to-end Antiviral Drug Repurposing
Framework by Multi-Modal Deep Embeddings
Our manuscript provided Expandable Ensemble end-to-end
(directly from sequence to drug) software for Antiviral Drug Repurposing by
Multi-Modal Embeddings and Transfer Learning.
By inputting a viral sequence Fasta file (DNA sequences)
and /or descriptions of the virus, our software will generate corresponding
features and embeddings and then predict potential drug candidates using a
multi-feature-view domains ensemble machine learning algorithm. (In the first
stage, we will focus on the SMILE-generated drug molecular graph, drug semantic
context-based, and drug network-based features/embeddings).
Pre-trained deep-learning models are leveraged for image-based
and corpus-based embedding generation. (Such as Resnet-50, and Alberta)
We leveraged the Double Anchor (fix random seeds in
negative sampling and cross-validation dataset split, two random processes) to
reduce the random seed influence in selecting negative samples and Feature Pool
(select the best combination from different feature-view-domains, and kept only
one for each virus-drug feature-view-domain combination pairs, or vdkey) to
select the best feature/embedding extractors in the process of training the
ensemble models.
Software/Features/codes are available.