The purpose of this article is to provide a way of cleaning up of HTML tags within the data. When we use various styles or tabular format data in UI using Rich Text Editor/ Rad Grid etc, it will save data in database with HTML tags.
When you save in database this kind of field you have:
An HTML element starts with a start tag (<p>) and ends with end tag (<p/>) and everything between Start tag and End tag is HTML element. e.g.
<b>Following are the popular databases: <br /> </b>1. SQL Server <br /> 2. Oracle <br /> 3.
Teradata <br /> 4. Sybase
We are using SQL Server CHARINDEX function to get the positions of ‘<’ and ‘>’ and once found replacing the string between <….> with blank using STUFF function. We are using WHILE Loop that will run till presence of ‘<’ & ‘>’ in the string. Below is the UDF script that performs HTML Tags clean up from data.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[usp_ClearHTMLTags]
(@String NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT,
@End INT,
@Length INT
WHILE CHARINDEX('<', @String) > 0 AND CHARINDEX('>', @String, CHARINDEX('<', @String)) > 0
BEGIN
SELECT @Start = CHARINDEX('<', @String),
@End = CHARINDEX('>', @String, CHARINDEX('<', @String))
SELECT @Length = (@End - @Start) + 1
IF @Length > 0
BEGIN
SELECT @String = STUFF(@String, @Start, @Length, '')
END
END
RETURN @String
END
GO
By using above UDF, We can clean the HTML tags from the data.