Single Concatenated Input is Better than Indenpendent Multiple-input for CNNs to Predict Chemical-induced Disease Relation from Literature

Pham Thi Quynh Trang; Bui Manh Thang; Dang Thanh Hai

doi:10.25073/2588-1086/vnucsce.237

Pham Thi Quynh Trang, Bui Manh Thang, Dang Thanh Hai

pdf

Published May 30, 2020

DOI: https://doi.org/10.25073/2588-1086/vnucsce.237

How to Cite

TRANG, Pham Thi Quynh; THANG, Bui Manh; HAI, Dang Thanh. Single Concatenated Input is Better than Indenpendent Multiple-input for CNNs to Predict Chemical-induced Disease Relation from Literature. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 36, n. 1, may 2020. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/237>. Date accessed: 19 feb. 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.237.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 36 No 1 (2020)

Section

Original Articles

Abstract

Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-induced disease relations (CDR), which is essential for drug safety and toxicity. It is, however, a catastrophic burden for them as PubMed is a huge database of unstructured texts, growing steadily very fast (~28 millions scientific articles currently, approximately two deposited per minute). As a result, biomedical text mining has been empirically demonstrated its great implications in biomedical research communities. Biomedical text has its own distinct challenging properties, attracting much attetion from natural language processing communities. A large-scale study recently in 2018 showed that incorporating information into indenpendent multiple-input layers outperforms concatenating them into a single input layer (for biLSTM), producing better performance when compared to state-of-the-art CDR classifying models. This paper demonstrates that for a CNN it is vice-versa, in which concatenation is better for CDR classification. To this end, we develop a CNN based model with multiple input concatenated for CDR classification. Experimental results on the benchmark dataset demonstrate its outperformance over other recent state-of-the-art CDR classification models.

Keywords:

Chemical disease relation prediction, Convolutional neural network, Biomedical text mining

References

[1] Paul SM, S. Mytelka, C.T. Dunwiddie, C.C. Persinger, B.H. Munos, S.R. Lindborg, A.L. Schacht, How to improve R&D productivity: The pharmaceutical industry's grand challenge, Nat Rev Drug Discov. 9(3) (2010) 203-14. https://doi.org/10.1038/nrd3078.

[2] J.A. DiMasi, New drug development in the United States from 1963 to 1999, Clinical pharmacology and therapeutics 69 (2001) 286-296. https://doi.org/10.1067/mcp.2001.115132.

[3] C.P. Adams, V. Van Brantner, Estimating the cost of new drug development: Is it really $802 million? Health Affairs 25 (2006) 420-428. https://doi.org/10.1377/hlthaff.25.2.420.

[4] R.I. Doğan, G.C. Murray, A. Névéol et al., "Understanding PubMed user search behavior through log analysis", Oxford Database, 2009.

[5] G.K. Savova, J.J. Masanz, P.V. Ogren et al., "Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications", Journal of the American Medical Informatics Association, 2010.

[6] T.C. Wiegers, A.P. Davis, C.J. Mattingly, Collaborative biocuration-text mining development task for document prioritization for curation, Database 22 (2012) pp. bas037.

[7] N. Kang, B. Singh, C. Bui et al., "Knowledge-based extraction of adverse drug events from biomedical text", BMC Bioinformatics 15, 2014.

[8] A. Névéol, R.L. Doğan, Z. Lu, "Semi-automatic semantic annotation of PubMed queries: A study on quality, Efficiency, Satisfaction", Journal of Biomedical Informatics 44, 2011.

[9] L. Hirschman, G.A. Burns, M. Krallinger, C. Arighi, K.B. Cohen et al., Text mining for the biocuration workflow, Database Apr 18, 2012, pp. bas020.

[10] Wei et al., "Overview of the BioCreative V Chemical Disease Relation (CDR) Task", Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, 2015.

[11] P. Verga, E. Strubell, A. McCallum, Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction, In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 (2018) 872-884.

[12] Y. Shen, X. Huang, Attention-based convolutional neural network for semantic relation extraction, In: Proceedings of COLING 2016, the Twenty-sixth International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, Osaka, Japan, 2016, pp. 2526-2536.

[13] Y. Peng, Z. Lu, Deep learning for extracting protein-protein interactions from biomedical literature, In: Proceedings of the BioNLP 2017 Workshop, Association for Computational Linguistics, Vancouver, Canada, 2016, pp. 29-38.

[14] S. Liu, F. Shen, R. Komandur Elayavilli, Y. Wang, M. Rastegar-Mojarad, V. Chaudhary, H. Liu, Extracting chemical-protein relations using attention-based neural networks, Database, 2018.

[15] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016, pp. baw048.

[16] S. Liu, B. Tang, Q. Chen et al., Drug–drug interaction extraction via convolutional neural networks, Comput, Math, Methods Med, Vol (2016) 1-8. https://doi.org/10.1155/2016/6918381.

[17] L. Wang, Z. Cao, G. De Meloet al., Relation classification via multi-level attention CNNs, In: Proceedings of the Fifty-fourth Annual Meeting of the Association for Computational Linguistics 1 (2016) 1298-1307.

https://doi.org/10.18653/v1/P16-1123.

[18] J. Gu, F. Sun, L. Qian et al., Chemical-induced disease relation extraction via convolutional neural network, Database (2017) 1-12. https://doi.org/10.1093/database/bax024.

[19] H.Q. Le, D.C. Can, S.T. Vu, T.H. Dang, M.T. Pilehvar, N. Collier, Large-scale Exploration of Neural Relation Classification Architectures, In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2266-2277.

[20] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, In Proceedings of the IEEE. 86(11) (1998) 2278-2324.

[21] Y. Kim, Convolutional neural networks for sentence classification, ArXiv preprint arXiv:1408.5882.

[22] C. Nagesh, Panyam, Karin Verspoor, Trevor Cohn and Kotagiri Ramamohanarao, Exploiting graph kernels for high performance biomedical relation extraction, Journal of biomedical semantics 9(1) (2018) 7.

[23] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016.

Article Sidebar

Article Details

Main Article Content

Abstract