Time complexity & Time
In this page,we provided detailed table outlining the time complexity of the algorithm, and also provided running time for important parts of DeepSeq2drug.
Table1 Time complexity
Code | Time complexity |
Main Program AUC/AUPR generation | O(nmr) |
Feature Pool for vdkey Selection | O(nm) |
The main program that generates AUC/AUPR for all pairs of vdkey will Traverse all viral embeddings and drug embeddings. The time complexity for this step is O(nm), n equals the viral feature/embedding files, while m equals the drug embedding files. As we need to repeat the experiments, thus In summary, the overall code time complexity is O(nmr), r means the repetition of the experiments. Also, the embedding length and the training/testing might depend on the size of the dataset and model. Thus, the time complexity of these parts might be far more than O(nmr). For example, if we conduct our experiments using an extremely unbalanced dataset, the Time could also be extremely long.
The purpose of the feature Pool is to select the best vdkey pairs from each modality (feature-view domain); the time complexity of the feature pool depends on the loop of traversing all vdkeys and p-values. Thus, the time complexity of this step is O(nm), while n is the number of all vdkeys while m is the number of p-values.
Except for the main parts of our project, we still have other parts. We further recorded some real-time consumed for each process.
The platforms we use for testing/calculating Time are shown below:
- Platform 1:
- GTX2080Ti GPU
- Intel i7-9700
- 64GB DDR4 memory
- Platform 2:
- GTX4080 Laptop GPU
- Intel i9-13900HX
- 64GB DDR5 memory
Runing Time table for Different parts of DeepSeq2drug
Parts | Description | Samples | Platform | Average running time(s) |
Preprocess | Extraction of bio-GPT-Drugs | 1000 | 2 | 1.3087 |
Preprocess | Extraction of 3D-resnet50 | 2 | 0.0402 | |
Preprocess | Extraction of GPT2-Drugs | 1000 | 1 | 0.0825 |
Preprocess | Extraction of Albert-virus | 89 | 1 | 0.0865 |
Preprocess | Doc2vec | 1000 | 1 | 0.0441 |
Preprocess | 5mer | 1000 | 1 | 0.0099 |
Preprocess | 4mer | 1000 | 1 | 0.0065 |
Preprocess | PseKNC | 1000 | 1 | 0.7216 |
Feature pool | P-values calculation | N/A | 1 | 2.1084 |
Feature pool | Vdkey Rank | N/A | 1 | 0.0210 |
Train-Validation | 5mer+bioGPT (nr=0.5) | N/A | 2 | 267.3627 |
Train-Validation | 5mer+bioGPT (nr=1) | N/A | 2 | 410.1965 |
Train-Validation | 5mer+bioGPT (nr=2) | N/A | 2 | 733.2629 |
Train-Validation | 5mer+bioGPT (nr=4) | N/A | 2 | 1424.035 |
Train-Validation | 5mer+bioGPT (nr=8) | N/A | 2 | 2967.333 |
Train-Validation | 5mer+bioGPT (nr=16) | N/A | 2 | 4508.527 |
Train-Validation | PseKNC+role2vec (nr=0.5) | N/A | 2 | 536.3279 |
Train-Validation | PseKNC+role2vec (nr=1) | N/A | 2 | 812.8335 |
Train-Validation | PseKNC+role2vec (nr=2) | N/A | 2 | 1408.047 |
Train-Validation | PseKNC+role2vec (nr=4) | N/A | 2 | 2758.145 |
Train-Validation | PseKNC+role2vec (nr=8) | N/A | 2 | 5502.763 |
Train-Validation | PseKNC+role2vec (nr=18) | N/A | 2 | 6688.011 |
As can be seen from the table above, if we conduct our experiments using different nr to control the size of an unbalanced dataset and using different feature combinations, the running Time could also be varied. Also, the most time consuming parts for Deepseq2drug is to train and validate the models.