On Rectifying the Mapping between Articles  and Institutions in Bibliometric Databases

Ngo Kien Tuan; Vo Dinh Hieu; Bui Ngoc Thang; Pham Le Viet Anh; Pham Khanh Ly; Phan Hai

doi:10.25073/2588-1086/vnucsce.242

Ngo Kien Tuan, Vo Dinh Hieu, Bui Ngoc Thang, Pham Le Viet Anh, Pham Khanh Ly, Phan Hai

pdf

Published Oct 5, 2020

DOI: https://doi.org/10.25073/2588-1086/vnucsce.242

How to Cite

TUAN, Ngo Kien et al. On Rectifying the Mapping between Articles and Institutions in Bibliometric Databases. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 36, n. 2, oct. 2020. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/242>. Date accessed: 28 july 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.242.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 36 No 2 (2020)

Section

Articles

Abstract

Today, bibliometric databases are indispensable sources for researchers and research institutions. The main role of these databases is to find research articles and estimate the performance of researchers and institutions. Regarding the evaluation of the research performance of an organization, the accuracy in determining institutions of authors of articles is decisive. However, current popular bibliometric databases such as Scopus and Web of Science have not addressed this point eﬃciently. To this end, we propose an approach to revise the authors’ aﬃliation information of articles in bibliometric databases. We build a model to classify articles to institutions with high accuracy by assembling the bag of words and n-grams techniques for extracting features of aﬃliation strings. After that, these features are weighted to determine their importance to each institution. Aﬃliation strings of articles are transformed into the new feature space by integrating weights of features and local characteristics of words and phrases contributing to the sequences. Finally, on the feature space, the support vector classifier method is applied to learn a predictive model. Our experimental result shows that the proposed model’s accuracy is about 99.1%.

Keywords:
Aﬃliation, Disambiguation, Data cleaning, Classification, Supervised learning, if-iif, Support vector machine, Support vector classifier

References
[1] B. Shereen Hanafi, Discover the data behind the times higher education world university rankings, Elsevier Connect.
[2] Dobrota, M. Bulajic, L. Bornmann, V. Jeremic, A new approach to the qs university ranking using the composite i-distance indicator: Uncertainty and sensitivity analyses, JASIST 67 (2016) 200-211.
[3] -P. Pavel, Global university rankings - a comparative analysis, Procedia Economics and Finance 26 (2015) 54-63. https://doi.org/10.1016/S2212-5671(15)00838-2.
[4] Web of science databases, Clarivate Analytics.
[5] F. Burnham, Scopus database: a review, Biomedical Digital Libraries 3. http://doi.org/10.1186/1742-5581-3-1.[6] Franceschini, D. Maisano, L. Mastrogiacomo, A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics, Journal of the american society for information science and technology 64 (2013) 2149-2156. https://doi.org/10.1002/asi.22898.
[7] Franceschini, D. Maisano, L. Mastrogiacomo, Scientific journal publishers and omitted citations in bibliometric databases: Any relationship?, Journal of Informetrics 8(3) (2014) 751 - 765. https://doi.org/10.1016/j.joi.2014.07.003.
[8] Buchanan, Accuracy of cited references: The role of citation databases, College Research Libraries 67. http://doi.org/10.5860/crl.67.4.292.
[9] Valderrama-Zurián, R. Aguilar-Moya, D. Melero-Fuentes, R. Aleixandre-Benavent, A systematic analysis of duplicate records in scopus, Journal of Informetrics 9 (2015) 570–576. http://doi.org/ 10.1016/j.joi.2015.05.002.
[10] Zhu, G. Hu, W. Liu, Doi errors and possible solutions for web of science, Scientometrics 118(2) (2019) 709-718. http://doi.org/10.1007/s11192-018-2980-7.
[11] Xu, L. Hao, X. An, D. Zhai, H. Pang, Types of doi errors of cited references in web of science with a cleaning method, Scientometrics 120(3) (2019) 1427-1437. http://doi.org/ 10.1007/s11192-019-03162-4.
[12] Krauskopf, Missing documents in scopus: the case of the journal enfermeria nefrologica, Scientometrics 119(1) (2019) 543-547. https://doi.org/10.1007/ s11192-019-03040-z.
[13] Liu, G. Hu, L. Tang, Missing author address information in web of science-an explorative study, Journal of Informetrics 12(3) (2018) 985-997. https://doi.org/10.1016/j.joi.2018.07.008.
[14] Krauskopf, Standardization of the institutional address, Scientometrics 94(3) (2013) 1313-1315. http://doi.org/10.1007/s11192-012-0852-0.
[15] Krauskopf, Call for caution in the use of bibliometric data, J. Assoc. Inf. Sci. Technol. 68(8) (2017) 2029-2032. http://doi.org/10.1002/asi.23809.
[16] Awad, R. Khanna, Support Vector Machines for Classification, Apress, Berkeley, CA, 2015, pp. 39-66. http://doi:10.1007/978-1-4302-5990-9-3.
[17] Breiman, Random forests, Machine Learning 45(1) (2001) 5-32. https://doi.org/10.1023/A:1010933404324.[18] Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theor. 13(1) (2006) 21-27. http://doi.org/10.1109/TIT.1967.1053964.
[19] J.-C.B. Cuxac, P., Eﬃcient supervised and semi-supervised approaches for aﬃliations disambiguation, Scientometrics 97(1) (2013) 47-58.

Article Sidebar

Article Details

Main Article Content

Abstract