Exporting data from SCOPUS and WOS for Bibliometric Analysis
Based on My Research Guide's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Start with a clear research question and translate it into reliable keywords before searching Scopus or Web of Science.
Briefing
Bibliometric analysis depends on getting the right dataset first—especially when working with large, journal-based citation databases like Scopus and Web of Science. The core workflow starts with defining a clear research question and then translating it into reliable keywords. Keyword selection is treated as the most difficult and most consequential step: without targeted, dependable terms, the resulting manuscript and analysis can’t be trusted.
Once keywords are finalized, the next step is running structured searches inside Scopus (and then repeating the same logic in Web of Science). The search process begins by entering the keywords exactly as intended, then using Boolean logic to narrow results. When multiple terms are involved, the transcript emphasizes using operators like AND and OR to control subject scope, and using double quotation marks to force the database to match terms in specific fields such as the title, abstract, or author keywords. Without quotation marks, searches can return an unmanageably large volume of irrelevant records.
A practical example is given for narrowing a broad topic into a specific subfield: “urban flooding” is treated as an intersection where “urbanization” is the main theme, but the search must also require flooding-related terminology. The same idea applies to other domains (e.g., supply chain), where the database may treat related phrases differently (such as “supply chain” versus “supply-chain”), so the search strategy should account for variations. The transcript also mentions using wildcard or string approaches (e.g., star strings) and grouping terms in brackets to capture similar concepts while still keeping the query focused.
After the initial search returns thousands of documents, filtering becomes essential. The transcript describes narrowing by time window (such as limiting to the last 10 years), restricting document types (e.g., focusing on journal articles rather than conference proceedings, books, or book series), and limiting language to English when the research context requires it. These constraints reduce noise and make downstream bibliometric analysis more coherent.
Finally, exporting the dataset is presented as a key step in the data acquisition pipeline. In Scopus, export options include formats such as CSV (recommended because it aligns with Excel-style workflows), plain text, and other text-based formats. The export selection should include the bibliographic and citation fields needed for analysis—such as citation information, bibliography details, abstracts, and author keywords—while optionally excluding funding details. The same general approach is then applied to Web of Science, using equivalent selection and export choices (including plain text options). The overall message is straightforward: careful keyword engineering plus disciplined filtering and structured export is what turns massive database results into usable bibliometric data.
Cornell Notes
Bibliometric work using Scopus and Web of Science starts with a research question and ends with an exportable dataset. The transcript stresses that keyword selection is the hardest and most important step: targeted keywords determine whether the analysis will be meaningful. Searches should use Boolean operators (AND/OR), quotation marks to match terms in title/abstract/keywords, and sometimes wildcard/string methods to capture phrase variations. After retrieving results, filtering by time range, document type (e.g., journals only), and language (often English) reduces irrelevant records. Exporting in a structured format like CSV, along with citation/bibliographic fields and abstracts/keywords, prepares the data for bibliometric analysis.
Why is keyword selection treated as the most critical step in bibliometric data acquisition?
How do Boolean operators and quotation marks change what Scopus returns?
What’s the purpose of narrowing results after an initial keyword search returns thousands of documents?
How can a search be designed to focus on a subfield rather than a broad topic?
Why is CSV recommended for exporting Scopus results, and what fields should be included?
How does the workflow in Web of Science relate to Scopus?
Review Questions
- When would using quotation marks in Scopus materially change your results compared with not using them?
- Describe a filtering strategy (time window, document type, language) that would reduce noise in a bibliometric dataset.
- What combination of query techniques (Boolean logic, brackets, phrase variants) would you use to target a subfield like “urban flooding” rather than “urbanization” alone?
Key Points
- 1
Start with a clear research question and translate it into reliable keywords before searching Scopus or Web of Science.
- 2
Use Boolean operators (AND/OR) to control whether records must include multiple concepts or any of several related terms.
- 3
Use double quotation marks to match terms in indexed fields such as title, abstract, and author keywords, preventing overly broad results.
- 4
Account for phrase variations (e.g., “supply chain” vs “supply-chain”) using appropriate string/variant handling and grouped terms.
- 5
After retrieving results, narrow by time range (e.g., last 10 years), document type (journal articles), and language (often English).
- 6
Export only the fields needed for analysis—citation and bibliographic data, plus abstracts/keywords—while optionally excluding funding details.
- 7
Prefer CSV export for Scopus when the next step involves Excel-style screening and processing.