Considering the dependence between protein properties and sub-structure, we devise an unsupervised task to learn the relationship between sub-structures. Furthermore, for learning from the vivo process, we adopt a single stream to feed the information into the model and encode it together. Specifically, we name the above framework SubBERT, which chooses the bidirectional Transformers as the backbone. Comprehensive experiment results show that SubBERT achieves the best performance in Pearson correlation coefficient r and comparable performance in RMSE on three of five datasets. Furthermore, the ablation experiment demonstrates that the unsupervised pre-training and the sub-structure encoding can improve the performance of SubBERT, which could also be applied to other deep learning models. To the best of our knowledge, SubBERT is first method to unified encode protein and drug.
#### The details about the project to be continuted... ####