Codes

Pre-process:

Due to the Limitation of GPU memory resources, we need to preprocess, and we might not need to conduct the preprocess if we have more calculation resources.
In the preprocessing pipeline, here we leveraged some classical Natural Language Process preprocessing techniques, including lemmatizing, stemming, and removing stopwords. NLTK python package was used to preprocess the abstract data collected.

one example of preprocess can be downloaded at:

https://github.com/Xshelton/DeepSeq2Drug/blob/main/PP_example.zip

—————————————————-

Features/Embedding Extractor/Demos:

Viral Feature-view domain:Sequence-Features Extractors

Viral Feature-view domain: Sequence-Embedding Extractors

Viral Feature-view domain: Corpus-based viral embedding Extractors

Drug Feature view domain: Corpus_Based_embedding

Drug Feature view domain: Graph-based (image-based embedding)

Drug Feature view domain: Networkd Based embedding

For Corpus-Based Embedding:

version of Tensorflow-gpu==2.4.0

keras==2.4.3

CUDA version==11.2

CUDNN 8.1.1

For validation sample:

This contain the policy of double anchors.

validation_example.zip

For Feature Pool sample:

by aggregating the results from the validation, we could then run the Feature pool for further selection of vdkeys.

Feature Pool code Example

For Case Study code examples:

Case study prediction

Not good at coding? We also provided Software directly from viral fasta to drug candidates.

Also, you can visit our github repo for more information:

Please visit https://github.com/Xshelton/DeepSeq2Drug