How to remove HTML tags from data with SQL

The purpose of this article is to provide a way of cleaning up of HTML tags within the data. When we use various styles or tabular format data in UI using Rich Text Editor/ Rad Grid etc, it will save data in database with HTML tags.

When you save in database this kind of field you have:

An HTML element starts with a start tag () and ends with end tag () and everything between Start tag and End tag is HTML element. e.g.

Following are the popular databases: 1. SQL Server 2. Oracle 3.
Teradata 4. Sybase

We are using SQL Server CHARINDEX function to get the positions of ‘<’ and ‘>’ and once found replacing the string between <….> with blank using STUFF function. We are using WHILE Loop that will run till presence of ‘<’ & ‘>’ in the string. Below is the UDF script that performs HTML Tags clean up from data.

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[usp_ClearHTMLTags]  
    (@String NVARCHAR(MAX))  
    RETURNS NVARCHAR(MAX)  

AS  
    BEGIN  
        DECLARE @Start INT,  
                @End INT,  
                @Length INT  

        WHILE CHARINDEX('<', @String) > 0 AND CHARINDEX('>', @String, CHARINDEX('<', @String)) > 0  
        BEGIN  
            SELECT  @Start  = CHARINDEX('<', @String),   
                    @End    = CHARINDEX('>', @String, CHARINDEX('<', @String))  
            SELECT @Length = (@End - @Start) + 1  

            IF @Length > 0  
            BEGIN  
                SELECT @String = STUFF(@String, @Start, @Length, '')  
             END  
         END  
        RETURN @String  
    END  
GO

By using above UDF, We can clean the HTML tags from the data.

How to remove HTML tags from data with SQL

Related

Related posts

Leave a ReplyCancel reply

Share on Mastodon

Related

Related posts

Release SSIS using Azure DevOps

Split SQL script tool

How do I extract text that lies between two strings?

How do I extract text that lies between two strings?

Leave a ReplyCancel reply