How to remove HTML tags from data with SQL

The purpose of this article is to provide a way of cleaning up of HTML tags within the data. When we use various styles or tabular format data in UI using Rich Text Editor/ Rad Grid etc, it will save data in database with HTML tags.

When you save in database this kind of field you have:

An HTML element starts with a start tag (<p>) and ends with end tag (<p/>) and everything between Start tag and End tag is HTML element. e.g.

<b>Following are the popular databases: <br /> </b>1. SQL Server <br /> 2. Oracle <br /> 3.
Teradata <br /> 4. Sybase

We are using SQL Server CHARINDEX function to get the positions of ‘<’ and ‘>’ and once found replacing the string between <….> with blank using STUFF function. We are using WHILE Loop that will run till presence of ‘<’ & ‘>’ in the string. Below is the UDF script that performs HTML Tags clean up from data.

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[usp_ClearHTMLTags]  
    (@String NVARCHAR(MAX))  
    RETURNS NVARCHAR(MAX)  

AS  
    BEGIN  
        DECLARE @Start INT,  
                @End INT,  
                @Length INT  

        WHILE CHARINDEX('<', @String) > 0 AND CHARINDEX('>', @String, CHARINDEX('<', @String)) > 0  
        BEGIN  
            SELECT  @Start  = CHARINDEX('<', @String),   
                    @End    = CHARINDEX('>', @String, CHARINDEX('<', @String))  
            SELECT @Length = (@End - @Start) + 1  

            IF @Length > 0  
            BEGIN  
                SELECT @String = STUFF(@String, @Start, @Length, '')  
             END  
         END  
        RETURN @String  
    END  
GO

By using above UDF, We can clean the HTML tags from the data.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.