Essentially, historical newspapers play a significant role in research on humanities. These data aims at providing resources about places and individual in historical perspective(s). In this scenario, the authors made an assumption that there is only a single “Joseph Lyons” annually in order to apply the named-entity-recognition (NER) as argued by Kim, Sunghwan, and Cassidy (61). However, if this was not the case, it would be difficult to distinguish news articles that bare the aforementioned name from Trove. It is for this reason that the researchers assume that there is only one “Joseph Lyons” used for each year with the specific identity utilized in publishing news articles. Moreover, if the authors did not consider such assumptions, it would be a nightmare for many to differentiate whether or not the name “Joseph Lyons” belongs to the one and only the Australian Prime Minister (62). Thus, all these events happened because there were constraints in the data which are used to extract names. Ideally, names are shared among many individuals hence the need for NER. Elsewhere, the trove collection contained 152 million articles which prompted the use of NER to help cluster the names of persons in light of identifying various individuals. Nevertheless, it was not because of a sizable collection that led the authors to make such assumptions but rather the need for personal distinction.
Conversely, the authors used the vector-space methodology for representation in a bid to differentiate that “Joseph Lyons” referenced in that particular year was the Prime Minister and not any other person. Using vector-space, researchers manually investigated all the newspaper articles that had “Joseph Lyons” by tallying the ones that had the PM (63). Also, it is important to note that in the process of counting, the previous political titles before attaining the PM status are not recognized. In a nut shell, the first instance would give a potential false positive especially if the analyst did not conduct an extensive and accurate clustering procedures.
- Mac Kim, Sunghwan, and Steve Cassidy. “Finding Names in Trove: Named Entity Recognition for Australian Historical Newspapers.” Proceedings of the Australasian Language Technology Association Workshop 2015. 2015. Accessed on 24th November, 2017 from https://aclweb.org/anthology/U/U15/U15-1007.pdf