Time complexity & Time – DeepSeq2Drug

Time complexity & Time

In this page,we provided detailed table outlining the time complexity of the algorithm, and also provided running time for important parts of DeepSeq2drug.

Table1 Time complexity

Code	Time complexity
Main Program AUC/AUPR generation	O(nmr)
Feature Pool for vdkey Selection	O(nm)

The main program that generates AUC/AUPR for all pairs of vdkey will Traverse all viral embeddings and drug embeddings. The time complexity for this step is O(nm), n equals the viral feature/embedding files, while m equals the drug embedding files. As we need to repeat the experiments, thus In summary, the overall code time complexity is O(nmr), r means the repetition of the experiments. Also, the embedding length and the training/testing might depend on the size of the dataset and model. Thus, the time complexity of these parts might be far more than O(nmr). For example, if we conduct our experiments using an extremely unbalanced dataset, the Time could also be extremely long.

The purpose of the feature Pool is to select the best vdkey pairs from each modality (feature-view domain); the time complexity of the feature pool depends on the loop of traversing all vdkeys and p-values. Thus, the time complexity of this step is O(nm), while n is the number of all vdkeys while m is the number of p-values.

Except for the main parts of our project, we still have other parts. We further recorded some real-time consumed for each process.

The platforms we use for testing/calculating Time are shown below:

Platform 1:
GTX2080Ti GPU
Intel i7-9700
64GB DDR4 memory
Platform 2:
GTX4080 Laptop GPU
Intel i9-13900HX
64GB DDR5 memory

Runing Time table for Different parts of DeepSeq2drug

Parts	Description	Samples	Platform	Average running time(s)
Preprocess	Extraction of bio-GPT-Drugs	1000	2	1.3087
Preprocess	Extraction of 3D-resnet50		2	0.0402
Preprocess	Extraction of GPT2-Drugs	1000	1	0.0825
Preprocess	Extraction of Albert-virus	89	1	0.0865
Preprocess	Doc2vec	1000	1	0.0441
Preprocess	5mer	1000	1	0.0099
Preprocess	4mer	1000	1	0.0065
Preprocess	PseKNC	1000	1	0.7216
Feature pool	P-values calculation	N/A	1	2.1084
Feature pool	Vdkey Rank	N/A	1	0.0210
Train-Validation	5mer+bioGPT (nr=0.5)	N/A	2	267.3627
Train-Validation	5mer+bioGPT (nr=1)	N/A	2	410.1965
Train-Validation	5mer+bioGPT (nr=2)	N/A	2	733.2629
Train-Validation	5mer+bioGPT (nr=4)	N/A	2	1424.035
Train-Validation	5mer+bioGPT (nr=8)	N/A	2	2967.333
Train-Validation	5mer+bioGPT (nr=16)	N/A	2	4508.527
Train-Validation	PseKNC+role2vec (nr=0.5)	N/A	2	536.3279
Train-Validation	PseKNC+role2vec (nr=1)	N/A	2	812.8335
Train-Validation	PseKNC+role2vec (nr=2)	N/A	2	1408.047
Train-Validation	PseKNC+role2vec (nr=4)	N/A	2	2758.145
Train-Validation	PseKNC+role2vec (nr=8)	N/A	2	5502.763
Train-Validation	PseKNC+role2vec (nr=18)	N/A	2	6688.011

As can be seen from the table above, if we conduct our experiments using different nr to control the size of an unbalanced dataset and using different feature combinations, the running Time could also be varied. Also, the most time consuming parts for Deepseq2drug is to train and validate the models.