Nishikawa-Pacher, A. (2022). Who are the 100 largest scientific publishers by journal count? A webscraping approach. Journal of Documentation, 78(7), 450–463. https://doi.org/10.1108/JD-04-2022-0083
E040-03-3 - Fachgruppe Szientometrie und Datenvisualisierung
-
Zeitschrift:
Journal of Documentation
-
ISSN:
0022-0418
-
Datum (veröffentlicht):
21-Sep-2022
-
Umfang:
14
-
Verlag:
EMERALD GROUP PUBLISHING LTD
-
Peer Reviewed:
Ja
-
Keywords:
predatory publishers; journals; bibliographic systems
en
Abstract:
Purpose: How to obtain a list of the 100 largest scientific publishers sorted by journal count? Existing databases are unhelpful as each of them inhere biased omissions and data quality flaws. This paper tries to fill this gap with an alternative approach.
Design/methodology/approach: The content coverages of Scopus, Publons, DOAJ and SherpaRomeo were first used to extract a preliminary list of publishers that supposedly possess at least 15 journals. Second, the publishers' websites were scraped to fetch their portfolios and, thus, their “true” journal counts.
Findings: The outcome is a list of the 100 largest publishers comprising 28.060 scholarly journals, with the largest publishing 3.763 journals, and the smallest carrying 76 titles. The usual “oligopoly” of major publishing companies leads the list, but it also contains 17 university presses from the Global South, and, surprisingly, 31 predatory publishers that together publish 4.606 journals.
Research limitations/implications: Additional data sources could be used to mitigate remaining biases; it is difficult to disambiguate publisher names and their imprints; and the dataset carries a non-uniform distribution, thus risking the omission of data points in the lower range.
Practical implications: The dataset can serve as a useful basis for comprehensive meta-scientific surveys on the publisher-level.
Originality/value: The catalogue can be deemed more inclusive and diverse than other ones because many of the publishers would have been overlooked if one had drawn from merely one or two sources. The list is freely accessible and invites regular updates. The approach used here (webscraping) has seldomly been used in meta-scientific surveys.