What Is a Systematic Search? And Check Keyword validation and Noise
- Mayta
- 2 days ago
- 3 min read
🔍 Definition: What Is a Systematic Search?
A systematic search is a methodologically rigorous, comprehensive, and transparent process of identifying all relevant literature on a specific research question. It is a cornerstone of evidence-based reviews, like systematic reviews or meta-analyses, and is designed to be reproducible, exhaustive, and bias-minimized. This approach contrasts sharply with informal or narrative literature searches.
📌 Key Features
Focused Research Question structured via frameworks like PICO (clinical) or DDO (determinant-driven).
Protocol-Based: Search strategy is predefined and registered or documented before execution.
Comprehensive: Uses multiple databases, both keyword and controlled vocabulary, and includes synonyms and variant terms.
Boolean Logic: Uses OR, AND, NOT to build structured search queries.
Reproducible: Must be fully documented for replication.
Bias-Reduction: Prevents selection bias by predefining inclusion/exclusion criteria and using exhaustive strategies.
📊 Pattern of a Query Table: Systematic Search Format
In systematic reviews, the search query documentation typically includes three structured
components:
✅ 1. Search Term List (Concept Table)
Component | Core Concepts & Variants |
Domain | Clinical trial, Kidney, Nephrology |
Determinant | Normality test, Skewness, Shapiro-Wilk, QQ plot |
Outcome | Parametric test, ANOVA, Wilcoxon, Regression |
Study Design | Systematic review, Meta-analysis |
✅ 2. Detailed Database Search Histories (Syntax Tables)
Each database (PubMed, EMBASE, Scopus) gets a structured table:
Example: PubMed Search History Table
Each row builds up Boolean logic blocks and uses database-specific syntax:
"[tw]" for text word in PubMed
"/exp" and ":ti,ab,kw,de" for Embase
TITLE-ABS-KEY() for Scopus
✅ 3. Summary Table
A final table is used to summarize how many hits each search yields:
Database | Result |
EMBASE | |
SCOPUS | |
PUBMED | |
Total (before deduplicate) | |
Total (after deduplicate) |
Keyword validation
✅ 1. Check if the Search Strategy Captures the Key Papers
(Does the current search string retrieve known relevant studies?)
This is called "backward validation" or inclusion verification.
🔎 How to do it:
Take a few relevant papers you already know are important.
Run your full search string in each target database (e.g., PubMed).
Check if those known papers appear in the results.
Check by adding AND with "Author name" AND"Year" of your relevant papers
If not, examine what terms those papers use.
Are they using synonyms or acronyms that your search missed?
Are you missing a MeSH or Emtree term?
🛠 Tools:
Use the article's PubMed ID (PMID) or title to test if it gets retrieved.
Use "Search within results" or filter by date or author to pinpoint it.
⚠️ 2. Check for Noisy or Overbroad Keywords
(Are some terms too generic and pulling in irrelevant results?)
This is part of precision testing — avoiding a flood of irrelevant results.
🔎 How to do it:
Run individual keyword blocks (one at a time).
Check the volume of results.
Skim the first few pages:
Are many irrelevant to your topic?
Are there domains (e.g., engineering, veterinary, finance) unrelated to your scope?
📌 Tip: Problematic terms often include:
Abbreviations: e.g., "SD", "AI"
Overbroad terms: e.g., "mean", "regression", "normality"
Truncated roots: e.g., nephro* might pull in both "nephrology" and unrelated prefixes
✅ Refinement Strategies:
Use phrase searching (quotes) to avoid word scattering.
Combine with field tags: mean[tw] instead of just mean
Add contextual anchors: pair broad terms with another concept using AND
🧠 Advanced Tip: Log and Score Terms
Create a tracking table for each term:
Include: result count, relevancy, noise ratio, and inclusion of known papers.
Example:
Term | Hits | Captures Known Papers | Noise | Keep? |
"normality test"[tw] | 45 | Yes | Low | ✅ |
mean[tw] | 50,000 | No | High | ❌ |
"descriptive statistics"[tw] | 500 | Yes | Medium | ✅ |
Comentarios