How to Remove Text By Matching in Excel
Learn multiple Excel methods to remove text by matching with step-by-step examples, real-world scenarios, and expert tips.
How to Remove Text By Matching in Excel
Why This Task Matters in Excel
Every data set eventually contains words, codes, or phrases you do not want. Marketing exports add “Promo-” in front of every campaign name, finance systems append “_FYE23” to cost-center strings, or a web scrape clutters product descriptions with “Read more”. Removing these bits of text is not merely cosmetic; it is essential for analysis, reporting, and downstream automation.
-
Business cleanup: Sales teams frequently merge CRM data with ERP extracts. If SKU descriptions carry legacy prefixes such as “OLD_” or “ARCH_”, pivot tables inflate active inventory counts. Removing those prefixes by matching saves hours of manual filtering and guarantees integrity for inventory valuations.
-
Data integration: When you have to match “ACME-100-XL” from one system to “100-XL” in another, text removal becomes a pre-mapping step. Clean strings ensure lookups, joins, and relationships work first time in Power Query or Power BI.
-
Customer communications: Generating personalized emails from Excel often requires stripping internal tags like “[DRAFT]” or “zzz_” before sending. An automatic removal process prevents embarrassing public-facing errors.
Excel excels (pun intended) at this because:
- It offers multiple text functions—SUBSTITUTE, REPLACE, TEXTSPLIT, TEXTBEFORE, TEXTAFTER—that work in a single cell without code.
- Dynamic array formulas (LET, MAP, REDUCE, LAMBDA) let advanced users process entire arrays of unwanted words in one shot.
- Non-formula tools such as Flash Fill, Find & Replace, and Power Query provide click-based alternatives when formulas are not ideal.
Failing to master text removal may lead to mismatched reports, duplicate rows, or public documents showing internal codes. Conversely, learning these techniques connects seamlessly to lookup skills, dynamic reports, dashboard creation, and even data modeling in Power Query and Power BI.
Best Excel Approach
For the broadest compatibility and control, the most reliable formula method is a nested SUBSTITUTE pattern driven by a list of words you want to remove. In Microsoft 365 you can simplify dramatically with REDUCE + SUBSTITUTE or TEXTSPLIT + TEXTJOIN. For users on older versions, nested SUBSTITUTE inside LET keeps formulas readable.
Recommended modern pattern (Microsoft 365):
=LET(
txt, A2,
removeList, FILTER($D$2:$D$10, $D$2:$D$10<>""),
REDUCE(txt, removeList, LAMBDA(acc, word, SUBSTITUTE(acc, word, "")))
)
Why this is best
- Scales to any length list without manual nesting
- Works with variable lists (just add/remove words in the [D2:D10] range)
- LET names improve readability and performance
- REDUCE iteratively feeds the updated string (acc) back into SUBSTITUTE until all matches are stripped.
Alternative for Excel 2019 & earlier (manual nesting of up to five words):
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,$D$2,""),$D$3,""),$D$4,"")
When to choose alternatives
- TEXTSPLIT+TEXTJOIN helps when you want to remove complete delimited tokens (“remove any word from list that appears as its own word, not inside bigger words”).
- REPLACE with FIND/SEARCH is faster for fixed patterns at the beginning or end of a string (prefix/suffix).
- Power Query is ideal for thousands of rows and repeatable ETL pipelines.
- Flash Fill shines for one-off transformations where patterns are visually obvious.
Parameters and Inputs
Required:
- txt (A2) – the original text string. Any cell or text literal will work.
- removeList ([D2:D10]) – vertical list of substrings to wipe out. Text values; blanks ignored.
Optional considerations: - Case sensitivity – SUBSTITUTE is case sensitive; use UPPER or LOWER around both txt and word for case-insensitive removal.
- Delimiters – if removal words should only be wiped when surrounded by spaces, pad “ word ” with spaces or switch to TEXTSPLIT/TEXTJOIN.
- Trimming – after removal you may have double spaces; wrap the final formula in TRIM() or TEXTJOIN with ignore_empty.
Input validation: - Ensure removeList does not contain formulas that evaluate to errors.
- Avoid overlapping patterns (e.g., “Sales” and “Sales_2023”) unless order is deliberate; shorter patterns may leave residue.
Edge cases: - Empty txt returns empty.
- If removal list is blank, REDUCE returns original text.
- Very long strings (over 32,767 characters) truncate; split across columns or use Power Query.
Step-by-Step Examples
Example 1: Basic Scenario – Remove “Promo” and “Draft” from Subject Lines
Imagine column [A] contains email subject lines exported from a marketing tool:
[A1] Subject
[A2] Promo-Big Sale Starts Today
[A3] Draft-Loyalty Points Update
[A4] Promo-Draft-End of Season Clearance
Removal words in [D2:D3]:
[D2] Promo-
[D3] Draft-
Step-by-step:
- In [B2] enter the modern formula:
=LET(
txt, A2,
removeList, D$2:D$3,
REDUCE(txt, removeList, LAMBDA(acc, word, SUBSTITUTE(acc, word, "")))
)
- Copy down to [B4].
- Excel spills each intermediate REDUCE result behind the scenes, leaving:
- [B2] Big Sale Starts Today
- [B3] Loyalty Points Update
- [B4] End of Season Clearance
Why it works
SUBSTITUTE removes “Promo-” first, feeds new text to the accumulator, then removes “Draft-”. Because prefixes are identical at line starts, no trimming is needed.
Variations
- Add “Test-” to [D4] and the formula instantly updates.
- Make removal case-insensitive by wrapping both txt and word with LOWER().
Troubleshooting
If you see stray hyphens, your words may not include trailing characters (e.g., remove “Promo” instead of “Promo-”). Expand the word to match exactly.
Example 2: Real-World Application – Cleaning Product Catalog Codes
Scenario: Procurement exports a catalog where product names are suffixed with region codes:
[A2] Hammer-EUR
[A3] Hammer-US
[A4] Wrench-APAC
[A5] Screwdriver-US
Goal: Remove anything that matches a region code list stored in [F2:F4].
[F2] ‑EUR
[F3] ‑US
[F4] ‑APAC
Business reason: Consolidate SKUs irrespective of region for global inventory valuation.
Steps:
- In [B2] enter:
=LET(
txt, A2,
remove, FILTER($F$2:$F$4, $F$2:$F$4<>""),
cleaned, REDUCE(txt, remove, LAMBDA(acc, code, SUBSTITUTE(acc, code, ""))),
TRIM(cleaned)
)
- Drag down to the last row.
Explanation line-by-line:
- FILTER keeps the region list dynamic in case you add more codes.
- REDUCE iteratively strips each code.
- TRIM ensures no trailing spaces remain (important if codes aren’t prefixed with hyphens).
Business benefit: You can now perform a UNIQUE or pivot on the cleaned names to see aggregate demand by product family, not by region.
Integration: Feed the cleaned column into VLOOKUP toward a master item table, then use SUMIFS to compile worldwide sales.
Performance note: On large catalogs (50,000 rows) dynamic arrays calculate fast (vectorized), but consider switching to Power Query if you anticipate millions of rows.
Example 3: Advanced Technique – Removing a Variable Stop-Word List from Paragraphs
You receive social media comments that must be sanitized from a list of banned words stored in a named range StopWords. Each comment sits in [A2:A1000], and some banned words differ only by case (e.g., “Bad”, “bad”).
Task: Replace banned words with nothing, but preserve punctuation and reduce multiple spaces to single.
Formula in [B2]:
=LET(
original, A2,
lowerText, LOWER(original),
result, REDUCE(lowerText, StopWords,
LAMBDA(acc, wrd,
SUBSTITUTE(acc, LOWER(wrd), "")
)
),
collapse, TEXTJOIN(" ", TRUE, TEXTSPLIT(result, " ")),
collapse
)
Explanation:
- LOWER converts both the text and stop words to lower case, ensuring case-insensitive matching.
- REDUCE removes each banned word.
- TEXTSPLIT breaks the paragraph into single-word array tokens where empty entries appear for removed words.
- TEXTJOIN puts everything back together with a single space delimiter, ignoring empty tokens (the TRUE argument).
Edge cases handled:
- Multiple consecutive banned words produce no extra spaces.
- Words containing punctuation such as “bad!” still match because LOWER(bad!) retains the exclamation mark; if that is undesired, expand TEXTSPLIT to include punctuation delimiters.
Professional tip: Convert the above into a reusable LAMBDA named CleanComment for repeated use across worksheets:
= LAMBDA(txt,
LET(
lowerText, LOWER(txt),
result, REDUCE(lowerText, StopWords,
LAMBDA(acc, wrd, SUBSTITUTE(acc, LOWER(wrd), ""))
),
TEXTJOIN(" ", TRUE, TEXTSPLIT(result, " "))
)
)
Now simply call =CleanComment(A2).
Tips and Best Practices
- Maintain your removal list in a dedicated table and reference it by structured name (e.g.,
tblRemove[List]). This keeps formulas portable and self-documenting. - Wrap SUBSTITUTE chains in LET to store interim results; Excel calculates each intermediate step once, reducing recalculation overhead.
- Always finish with TRIM() or TEXTJOIN ignore-empty to eliminate double spaces; failure to do so breaks downstream TEXTSPLIT and could yield mismatched lookup keys.
- For prefixes/suffixes only, use TEXTBEFORE or TEXTAFTER—they are faster than SUBSTITUTE because they operate once, not per occurrence.
- When batch editing thousands of rows, turn on “Manual Calculation” while entering formulas, then recalc once; this avoids sluggishness during typing.
- Document your cleanup logic in a note or the Name Manager, especially if you embed removal lists inside formulas—future teammates will thank you.
Common Mistakes to Avoid
- Partial overlap confusion – Removing “_US” before removing “_USA” leaves “A” hanging. Sort your removal list by descending length or use more specific patterns first.
- Forgetting case sensitivity – SUBSTITUTE treats “Promo” differently from “promo”. Wrap both sides in LOWER or use REPLACE with SEARCH for case-insensitive matching.
- Ignoring delimiter context – Deleting “in” from “Print” yields “Prt”. If you need whole-word removal, split on spaces and rejoin.
- Excessive nested SUBSTITUTE – Hard-coding ten nested functions makes maintenance impossible. Switch to REDUCE or a Power Query “Replace Values” step.
- Not trimming output – Double spaces sabotage TRIM-sensitive formulas like XLOOKUP on cleaned keys; always clean whitespace post-removal.
Alternative Methods
| Method | Strengths | Weaknesses | Best for | Excel Version |
|---|---|---|---|---|
| Nested SUBSTITUTE | Works everywhere | Hard to maintain, manual updates | Quick one-off removal of few words | All |
| LET + REDUCE | Scalable, readable, dynamic lists | Microsoft 365 only | Ongoing lists that grow over time | 365 |
| TEXTSPLIT + TEXTJOIN | Removes only whole tokens, perfect whitespace handling | Delimiters required, cannot remove substrings inside words | Stop-word filtering | 365 |
| TEXTBEFORE / TEXTAFTER | Fast for prefix/suffix | Only removes first instance unless wrapped in SUBSTITUTE | Trimming codes at edges | 365 |
| Power Query Replace Values | Handles hundreds of thousands of rows, UI driven | Workbook becomes Power Query dependent, learning curve | ETL pipelines, refreshable reports | 2016+ with PQ |
| Flash Fill | No formulas, instant visual | Not dynamic, must redo when data changes | One-time cleanup in small tables | 2013+ |
Choose REDUCE-based formulas when you need workbook-native, refresh-on-calculate behavior and share within teams on Microsoft 365. Opt for Power Query when processing large CSV files nightly, or Flash Fill when the dataset is five rows and the boss is waiting.
FAQ
When should I use this approach?
Use formula-based removal any time the clean text must feed further formulas, pivots, or dashboards and you want instant recalculation when source data or removal list changes.
Can this work across multiple sheets?
Yes. Store your removal list on a Config sheet, give it a named range (e.g., RemoveList), and reference it from any other sheet:
=LET(txt, A2, REDUCE(txt, RemoveList, LAMBDA(a,w, SUBSTITUTE(a,w,""))))
What are the limitations?
SUBSTITUTE cannot use wildcard patterns; it matches literal text only. For pattern-based removal (e.g., delete anything within brackets), move to Power Query or use REGEX.REPLACE if you have the Office Insiders beta.
How do I handle errors?
Wrap the entire formula in IFERROR to catch unexpected spills:
=IFERROR(YourFormula, "")
For Power Query, enable “Keep Errors” preview to diagnose which rows fail.
Does this work in older Excel versions?
Nested SUBSTITUTE works back to Excel 2007. LET, REDUCE, TEXTSPLIT, TEXTJOIN require Microsoft 365 or Excel 2021. If collaborating with users on 2016, stick to nested SUBSTITUTE or deliver the workbook as values.
What about performance with large datasets?
Dynamic arrays calculate vectorized, so 50,000 rows of REDUCE may still be snappy. Still, memory grows with cell length and number of removal words. Measure with Formula.Evaluate time; when delays exceed two seconds, migrate to Power Query or perform cleanup upstream in SQL.
Conclusion
Mastering text removal by matching unlocks cleaner data, faster lookups, and more trustworthy reports. Whether you apply a simple nested SUBSTITUTE or a modern LET-REDUCE powerhouse, the ability to strip away cluttered prefixes, suffixes, and banned words empowers every downstream task—from pivots to dashboards to data modeling. Practise with your own messy data, experiment with both formula and Power Query solutions, and soon “dirty” strings will never slow you down again.
Related Articles
How to Show the 10 Most Common Text Values in Excel
Learn multiple Excel methods to list the 10 most frequent text values—complete with step-by-step examples, business use cases, and expert tips.
How to Abbreviate Names Or Words in Excel
Learn multiple Excel methods to abbreviate names or words with step-by-step examples and practical applications.
How to Abbreviate State Names in Excel
Learn multiple Excel methods to abbreviate state names with step-by-step examples, professional tips, and real-world applications.