Referencing of Document Content Using Similarity Measures
Keywords:
Referencing system, Syntactic similarity, Semantic similarity, WordNet, Natural Language ProcessingAbstract
One of the biggest challenges with scientific writing automation is still the difficulty of automatically locating and adding relevant references in scholarly papers. This paper addresses this issue by proposing a three-phase automatic referencing system based on semantic similarity measures: reference insertion, semantic similarity computation, and preprocessing (tokenization, stop word removal, morphosyntactic marking, and lemmatization). Based on semantic similarity, our experimental results confirm that the system can automatically identify and insert relevant references. The Resnik measure outperformed the Mihalcea measure (43% accuracy, 50% precision, and 59% F1-score), achieving the best performance with (57% accuracy, 58% precision, and 64% F1-score).
References
J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: A literature survey,” International Journal on Digital Librairies, vol. 17, issue 4, pp. 305-338, 2016. https://doi.org/10.1007/s00799-015-0156-0.
D. Kotkov, S. Wang, and J. Veijalainen, “A survey of serendipity in recommender systems,” Knowledge-Based Systems, vol. 111(C), pp. 180-192, 2011. https://doi.org/10.1016/j.knosys.2016.08.014.
R. Mihalcea, C. Corley, and C. Strapparava, “Corpus based and knowledge-based measures of text semantic similarity,” Proceedings of the American Association for Artificial Intelligence, 2006, pp. 775-780.
R. Armstrong, D. Freitag, T. Joachims, T. Mitchell, et al., “Web Watcher: a learning apprentice for the world wide web,” Proceedings of the AAAI Spring symposium on Information gathering from Heterogeneous, Distributed Environments, 1996, pp. 6–12. https://doi.org/10.21236/ADA640219.
S. Gauch, J. Chaffee, A. Pretschner, “Ontology-based personalized search and browsing,” Web Intelligence and Agent Systems: An International Journal, vol. 1, issue 3, pp. 219-234, 2003.
F. O. Isinkaye, Y. O. Folajimi, A. Ojokoh, “Recommendation systems: Principles, methods and evaluation,” Egyptian Informatics Journal, pp. 261-273, 2015. https://doi.org/10.1016/j.eij.2015.06.005.
L. Zhang, X.-Y. Li, J. Lei, J. Sun, Y. Liu, Mechanism design for finding experts using locally constructed social referral web, 2014, [Online]. Available at: http://www.cs.iit.edu/~xli/paper/Journal/peoplesearch-TPDS.pdf.
J. Beel, B. Gipp, S. Langer, C. Breitinger, “Research paper recommender systems: a literature survey,” International Journal on Digital Libraries, pp. 1–34, 2015. https://doi.org/10.1007/s00799-015-0156-0.
J. Beel, S. Langer, M. Genzmehr, “Sponsored vs. organic (research paper) recommendations and the impact of labeling,” Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries, 2013, pp. 395–399. https://doi.org/10.1007/978-3-642-40501-3_44.
S. Gottwald, T. Koch, “Recommender systems for libraries,” Proceedings of the ACM International Conference on Recommender Systems, 2011, pp. 1–5.
M. Sridevi, R. Rajeshwara Rao, M. Varaprasad Rao, “A survey of recommender systems,” International Journal of Computer Science and Information Security (IJCSIS), vol. 14, no. 5, pp. 265-272, 2016.
A. F. Smeaton, J. Callan, “Personalization and recommender systems in digital libraries,” Int. J. Digit. Libr., vol. 5, issue 4, pp. 299–308, 2005. https://doi.org/10.1007/s00799-004-0100-1.
J. Beel, S. Langer, M. Genzmehr, B. Gipp, C Breitinger, A. Nürnberger, “Research paper recommender system evaluation qualitative literature survey,” Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), 2013, pp. 15–22. https://doi.org/10.1145/2532508.2532512.
J. Beel, B. Gipp, S. Langer, & C. Breitinger, “Research paper recommender systems: A literature survey,” International Journal on Digital Libraries, vol. 17, issue 4, pp. 305–338, 2016. https://doi.org/10.1007/s00799-015-0156-0.
J. Martinez-Romo, L. Araujo, J. Borge-Holthoefer, A. Arenas, J. A. Capitán, & J. A. Cuesta, Disentangling categorical relationships through a graph of co-occurrences, Phys. Rev. E, vol. 84, 046108, 2011, https://doi.org/10.1103/PhysRevE.84.046108.
E. Vargiu, M. Urru, “Exploiting web scraping in a collaborative filtering-based approach to web advertising,” Artificial Intelligence Research, vol. 2, no. 1, 2013, DOI: https://doi.org/10.5430/air.v2n1p44.
D. McLeod, A. Y.-A. Chen, “Collaborative filtering for information recommendation systems,” 2009. Non-published Research Reports. Paper 103. http://research.create.usc.edu/nonpublished_reports/103.
A. Tejeda-Lorente, C. Porcel, J. Bernabé-Moreno, & E. Herrera-Viedma, “Refore: A recommender system for re-searchers based on bibliometrics,” Applied Soft Computing, vol. 30, pp. 778–791, 2015. https://doi.org/10.1016/j.asoc.2015.02.024.
H. J. Kim, Y. K. Jeong, & M. Song, “Content- and proximity-based author co-citation analysis using citation sentences,” Journal of Informetrics, vol. 10, issue 4, pp. 954–966, 2016. https://doi.org/10.1016/j.joi.2016.07.007.
M. Eto, “Rough co-citation as a measure of relationship to expand co-citation networks for scientific paper searches,” Proceedings of the Association for Information Science and Technology, vol. 53, issue 1, pp. 1-4, 2016. https://doi.org/10.1002/pra2.2016.14505301131.
J. Wang and Y. Dong, “Measurement of text similarity: A survey,” Information, vol. 11, issue 9, 421, 2020. https://doi.org/10.3390/info11090421.
E. Negre, “Comparison of texts: some approaches,” April 2013. [Online]. Available at: https://hal.science/hal-00874280.
M. Deza, E. Deza, Encyclopedia of Distances, Springer: Berlin/Heidelberg, Germany, 2009, p. 583. https://doi.org/10.1007/978-3-642-00234-2_1.
M. Norouzi, D. J. Fleet, R. R. Salakhutdinov, “Hamming distance metric learning,” Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012, pp. 1061–1069.
T. Slimani, “Description and evaluation of semantic similarity measures approaches,” International Journal of Computer Applications, vol. 80, no. 10, pp. 25-33, 2013. https://doi.org/10.5120/13897-1851.
W. H. Gomaa, A. A. Fahmy, “A survey of text similarity approaches,” International Journal of Computer Applications, vol. 68, pp. 3-4, 2013. https://doi.org/10.5120/11638-7118.
S. Patwardhan, S. Banerjee, & T. Pedersen, “Using measures of semantic relatedness for word sense disambiguation,” Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, 2003, pp. 241–257. https://doi.org/10.1007/3-540-36456-0_24.
X. Aime, F. Furst, P. Kuntz, F. Trichet, “SEMIOSE: a measure of conceptual similarity based on a semiotic approach,” OTM Workshops, LNCS 5872, 2009, pp. 584–593. https://doi.org/10.1007/978-3-642-05290-3_72.
H. Zargayouna, S. Salotti, “Measure of similarity in an ontology for semantic indexing of XML documents,” Proceedings of the 15th Francophone Knowledge Engineering Days, Lyon, France, 2009, pp. 249-260.
Y. Li, Z. A. Bandar, and D. McLean, “An approach for measuring semantic similarity between words using multiple information sources,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, issue 4, pp. 871-882, 2003. https://doi.org/10.1109/TKDE.2003.1209005.
T. Pedersen, S. Patwardhan, J. Michelizzi, “WordNet: Similarity - Measuring the relatedness of concepts,” In Demonstration Papers at HLT-NAACL 2004, pages 38–41, Boston, Massachusetts, USA. Association for Computational Linguistics. https://doi.org/10.3115/1614025.1614037.
J. Rahnama, E. Hüllermeier, “Learning Tversky Similarity,” In: Lesot, MJ., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2020. Communications in Computer and Information Science, vol. 1238, 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-50143-3_21.
S. Torres and A. Gelbukh, “Comparing similarity measures for original WSD lesk algorithm,” Advances in Computer Science and Applications. Research in Computing Science, vol. 43, pp. 155-166, 2009.
A. W. Qurashi, V. Holmes, “Document processing: Methods for semantic text similarity analysis,” Proceedings of the International Conference on Innovation’s in Intelligent Systems and Applications (INISTA), 2020, vol. 6, pp. 2-4. https://doi.org/10.1109/INISTA49547.2020.9194665.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.