Remove Accents From a JavaScript String
Hit a bug recently where special characters were causing some pain and I didn't know how to solve it.
So now that I have, you can too!
To remove accents (diacritical marks) from letters in JavaScript without removing the letters themselves, you can use the String.prototype.normalize()
method combined with a regular expression.
Let's look at it in action and then I'll break it down:
const accentedString = "Éxàmplê òf áccéntéd téxt"; const cleanedString = accentedString.normalize("NFD").replace(/[\u0300-\u036f]/g, ""); console.log(cleanedString); // Example of accented text
How does it work?
Normalize the string: The function uses str.normalize("NFD")
to break down each accented character into its basic letter and separate accent mark. This step ensures that characters like "é" are split into "e" and the accent separately. This will make it possible for us to retain the original letter included with the accent.
Remove the accents: After normalization, the next step involves removing these accent marks from the string. This is done with .replace(/[\u0300-\u036f]/g, "")
, which searches for all accent marks in the Unicode range for diacritical marks and removes them, leaving behind the base letters.
Now all accented characters have been converted to their base letters, effectively removing any diacritical marks without affecting the rest of the text. Like in our example, transforming "Éxàmplê òf áccéntéd téxt" into "Example of accented text".