htmlentities -- Convert all applicable characters to HTML entities
Description
string htmlentities ( string string [, int quote_style [, string charset]] )
This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.
Like htmlspecialchars(), the optional second quote_style parameter lets you define what will be done with 'single' and "double" quotes. It takes on one of three constants with the default being ENT_COMPAT:
表格 1. Available quote_style constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
Support for the optional quote parameter was added in PHP 4.0.3.
Like htmlspecialchars(), it takes an optional third argument charset which defines character set used in conversion. Support for this argument was added in PHP 4.1.0. Presently, the ISO-8859-1 character set is used as the default.
If you're wanting to decode instead (the reverse) you can use html_entity_decode().
例子 1. A htmlentities() example
<?php
$str = "A 'quote' is <b>bold</b>";
// Outputs: A 'quote' is <b>bold</b>
echo htmlentities($str);
// Outputs: A 'quote' is <b>bold</b>
echo htmlentities($str, ENT_QUOTES);
?>
See also html_entity_decode(), get_html_translation_table(), htmlspecialchars(), nl2br(), and urlencode().
add a note User Contributed Notes
htmlentities
onlima
23-Nov-2006 01:56
in add to the last post...for flash dont forget the percent character too.
% = %25
& = %26
' = %27
realcj at g mail dt com
07-Nov-2006 02:41
If you are building a loadvars page for Flash and have problems with special chars such as " & ", " ' " etc, you should escape them for flash:
Try trace(escape("&")); in flash' actionscript to see the escape code for &;
& = %26
' = %27
<?php
function flashentities($string){
return str_replace(array("&","'"),array("%26","%27"),$string);
}
?>
Those are the two that concerned me. YMMV.
chuck at broker[remove]bin dot com
01-Nov-2006 09:33
/*
replaces everything but
alphanumeric
tab
newline
carriage return
*/
function allhtmlentities($string,$decode_first=true) {
// this is to ensure that any entities already coded are not "messed up"
if($decode_first) $string = html_entity_decode($string);
// "encode"
return preg_replace(
'/([^\x09\x0A\x0D\x20-\x7F]|[\x21-\x2F]|[\x3A-\x40]|[\x5B-\x60])/e'
, '"&#".ord("$0").";"', $string);
}
kevin at metalaxe dot com
16-Oct-2006 05:39
for danster3k at hotmail dot com, this will replace the ampersand returned for your £ with £ and display the entity int he browser:
<?php
if (isset($_POST['action']) && $_POST['action'] == 'submitted') {
$output =($_POST['myInput']);
print str_replace( '&', '&', htmlentities($output,ENT_QUOTES) );
using this code i simply want the user to input for example "£100" and then hit submit and this will output "£100" on the screen .. but for some reason it outputs £100 on the browser.. but in the view source it has £100 .. any idea how i can get it to display £100 on the page??
eric.wallet at yahoo.fr
26-Sep-2006 07:57
function htmlnumericentities($str){
return preg_replace('/[^!-%\x27-;=?-~ ]/e', '"&#".ord("$0").chr(59)', $str);
}
function numericentitieshtml($str){
return utf8_encode(preg_replace('/&#(\d+);/e', 'chr(str_replace(";","",str_replace("&#","","$0")))', $str));
}
echo (htmlnumericentities ("Ceci est un test : & é $ à ç <"));
echo ("<br/>\n");
echo (numericentitieshtml (htmlnumericentities ("Ceci est un test : & é $ à ç <")));
Output is :
Ceci est un test : & é $ à ç <<br/>
Ceci est un test : & é $ à ç <
First method convert characters to decimal values.
Second will reverse the problem !!!
lorenzo masetti at libero it
08-Aug-2006 11:44
i think I found a bug in makeSafeEntities procedure. I don't know why but if the string has a special charachter as the last one (e.g. 'liberté') the result will be truncated ('libert')
I solved by adding and taking a way a blank at the end of the string , it is not the most elegant solution but it works
This is the part that I changed in the original code that is at http://www.prolifique.com/entities.php.txt
<?php
function makeSafeEntities($str, $convertTags = 0, $encoding = "") {
if (is_array($arrOutput = $str)) {
foreach (array_keys($arrOutput) as $key)
$arrOutput[$key] = makeSafeEntities($arrOutput[$key],$encoding);
return $arrOutput;
}
else if (!empty($str)) {
$str .= " ";
$str = makeUTF8($str,$encoding);
$str = mb_convert_encoding($str,"HTML-ENTITIES","UTF-8");
$str = makeAmpersandEntities($str);
if ($convertTags)
$str = makeTagEntities($str);
$str = correctIllegalEntities($str);
return substr($str, 0, strlen($str)-1);
}
}
?>
daviscabral[arroba]gmail[ponto]com
29-Jul-2006 03:52
unhtmlentities for all entities:
?>
info at pirandot dot de
22-Jul-2006 10:14
Unfortunately, there are differences between what is shown in the preview window and what is shown on the web site; thus, the extreme number of backslashes in my former note.
The corrected note:
The data returned by a text input field is ready to be used in a data base query when enclosed in single quotes, e.g.
<?php
mysql_query ("SELECT * FROM Article WHERE id = '$data'");
?>
But you will get problems when writing back this data into the input field's value,
<?php
echo "<input name='data' type='text' value='$data'>";
?>
because hmtl codes would be interpreted and escape sequences would cause strange output.
The following function may help:
<?php
function deescape ($s, $charset='UTF-8')
{
// don't interpret html codes and don't convert quotes
$s = htmlentities ($s, ENT_NOQUOTES, $charset);
// delete the inserted backslashes except those for protecting single quotes
$s = preg_replace ("/\\\\([^'])/e", '"&#" . ord("$1") . ";"', $s);
// delete the backslashes inserted for protecting single quotes
$s = str_replace ("\\'", "&#" . ord ("'") . ";", $s);
return $s;
}
?>
Try some input like: a'b"c\d\'e\"f\\g&x#27;h to test ...
info at pirandot dot de
22-Jul-2006 09:00
The data returned by a text input field is ready to be used in a data base query when enclosed in single quotes, e.g.
<?php
mysql_query ("SELECT * FROM Article WHERE id = '$data'");
?>
But you will get problems when writing back this data into the input field's value,
<?php
echo "<input type='text' value='$data'>";
?>
because hmtl codes would be interpreted and escape sequences would cause strange output.
The following function may help:
<?php
function deescape ($s, $charset='UTF-8')
{
// don't interpret html codes and don't convert quotes
$s = htmlentities ($s, ENT_NOQUOTES, $charset);
// delete the inserted backslashes except those for protecting single quotes
$s = preg_replace ("/\\\\\\\\([^'])/e", '"&#" . ord("$1") . ";"', $s);
// delete the backslashes inserted for protecting single quotes
$s = str_replace ("\\\\'", "&#" . ord ("'") . ";", $s);
return $s;
}
?>
Try some input like: a'b"c\\d\\'e\\"f\\\\g&x#27;h to test ...
soapergem at gmail dot com
11-May-2006 02:14
A quick revision to my last comment. For some reason, leaving the control characters in the safe range seemed to screw things up. So instead, using this function will do what everybody else here is trying to do, but it will do so in a single line:
<?php
$text = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/e', '"&#".ord($0).";"', $text);
?>
cameron at prolifique dot com
11-May-2006 02:01
I've been asked why I assembled such intricate functions to convert to entities when I could use a very simple solution (like the one offered by soapergem below). The biggest reason is that the PHP htmlentities function and most of the other solutions listed below go haywire on multi-byte strings.
In addition, the entire range of numbered entities from through Ÿ are invalid characters, and should not be used (as noted by mail at britlinks dot com below). Most htmlentity functions also do not convert ampersands or pointy brackets (<>) to entities. The ones that do often reconvert existing entities (& becomes &).
cameron at prolifique dot com
06-May-2006 08:02
I've been dissatisfied with all the solutions I've yet seen for converting text into html entities, which all seem to have some drawback or another. So I wrote my own, borrowing heavily from other code posted on this site.
makeSafeEntities() should take any text, convert it from the specified charset into UTF-8, then replace all inappropriate characters with appropriate (and legal) character entities, returning generic ISO-8859 HTML text. Should NOT reconvert any entities already in the text.
makeAllEntities() does the same, but converts the entire string to entities. Useful for obscuring email addresses (in a lame but nonetheless somewhat effective way).
Suggestions for improvement welcome!
soapergem at gmail dot com
30-Apr-2006 02:53
Here's another version of that "allhtmlentities" function that an anonymous user posted in the last comment, only this one would be significantly more efficient. Again, this would convert anything that has an ASCII value higher than 127.
<?php
function allhtmlentities($string)
{
return preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', $string);
}
?>
anonymous
27-Apr-2006 03:38
This function will encode anything that is non Standard ASCII (that is, that is above #127 in the ascii table)
// allhtmlentities : mainly based on "chars_encode()" by Tim Burgan <timburgan@gmail.com> [http://www.php.net/htmlentities]
function allhtmlentities($string) {
if ( strlen($string) == 0 )
return $string;
$result = '';
$string = htmlentities($string, HTML_ENTITIES);
$string = preg_split("//", $string, -1, PREG_SPLIT_NO_EMPTY);
$ord = 0;
for ( $i = 0; $i < count($string); $i++ ) {
$ord = ord($string[$i]);
if ( $ord > 127 ) {
$string[$i] = '&#' . $ord . ';';
}
}
return implode('',$string);
}
eion at bigfoot dot com
21-Feb-2006 08:54
many people below talk about using
<?php
mb_convert_encode($s,'HTML-ENTITIES','UTF-8');
?>
to convert non-ascii code into html-readable stuff. Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.
So [insert korean characters here] turned into ?????.
I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)
so instead of
<?php
mb_convert_encode(&$s,'HTML-ENTITIES','UTF-8');
?>
which worked perfectly until I upgraded, so I had to use
<?php
call_user_func_array('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));
?>
Hope it helps someone else out
timburgan at gmail dot com
02-Feb-2006 10:39
chars_encode() will, by default, convert ALL non-alpha-numeric and non-space characters in a string to it's ASCII HTML character code (i.e. + to +).
chars_decode() will decode a string encoded by chars_encode().
<?php
/**
* Encode each char in a string to its HTML ASCII code.
* Characters encoded by this function can be decoded
* by chars_decode().
*
* This function will work with PHP >= 3.0.9, 4, 5.
*
* @author Tim Burgan <timburgan@gmail.com>
* @version 1.0.0
* @link http://timburgan.com/
* @link http://php.net/manual/function.htmlentities.php
* @param string $string String to encode
* @param bool $encodeAll Optional. Default is false. If true all chars are encoded, if false, only non-alpha-numeric and non-space chars are encoded.
* @return string Returns a string of encoded chars
*/
function chars_encode($string, $encodeAll = false)
{
// declare variables
$chars = array();
$ent = null;
// encode each character
for ( $i = 0; $i < count($chars); $i++ )
{
if ( preg_match("/^(\w| )$/",$chars[$i]) && $encodeAll == false )
$ent[$i] = $chars[$i];
else
$ent[$i] = "&#" . ord($chars[$i]) . ";";
}
if ( sizeof($ent) < 1)
return "";
return implode("",$ent);
}
/**
* Decode each char in a string from its HTML ASCII code.
* Characters denoded by this function were encoded
* by chars_encode().
*
* This function will work with PHP >= 3.0.9, 4, 5.
*
* @author Tim Burgan <timburgan@gmail.com>
* @version 1.0.0
* @link http://timburgan.com/
* @link http://php.net/manual/function.html-entity-decode.php
* @param string $string String to decode
* @return string Returns a string of decoded chars
*/
function chars_decode($string)
{
// declare variables
$tok = 0;
$cur = 0;
$chars = null;
// move through the string until the end is reached
while ( $cur < strlen($string) )
{
// find the next token
$tok = strpos($string, "&#", $cur);
// if no more tokens exist, move pointer to end of string
if ( $tok === false )
$tok = strlen($string);
// if the current char is alpha-numeric or a space
if ( preg_match("/^(\w| )$/",substr($string, $cur, 1)) )
{
$chars .= substr($string, $cur, $tok - $cur);
}
// the current char must be the start of a token
else
{
$cur += 2;
$tok = strpos($string, ';', $cur);
$chars .= chr(substr($string, $cur, $tok - $cur));
$tok++;
}
// move the current pointer to the next token
$cur = $tok;
}
return $chars;
}
/* Example usage
***********************************************/
$string = '<a href="http://timburgan.com" rel="external" title="Go to timburgan.com">Tim Burgan</a>';
?>
Bartek
01-Feb-2006 06:06
I use this function to convert imput from MS Word into html (ascii) compatible output. I hope it would work also for you.
I have enabled magic_quotes on my server so maybe you won't need stripslashes and addslashes.
I've also noticed that Opera 8.51 browses behaves somehow different from IE 6 and Firefox 1.5. I haven't check this functions with other browsers.
<?php
function convert_word_to_ascii($string)
{
$string = stripslashes($string);
$new_string = str_replace($search, $replace, $string);
return addslashes($new_string);
};
?>
24-Jan-2006 09:20
Please, don't use htmlentities to avoid XSS! Htmlspecialchars is enough!
If you don't specify the encoding, Latin1 will be used, so there is a problem if someone wants to use your software in a non-English environment.
mailing at jcn50 dot com
21-Jan-2006 02:25
Convert any language (Japanese, French, Chinese, Russian, etc...) to unicode HTML entities like &#XXXX;
In one line!
where $s is your string (may be a FORM submitted one).
Enjoy~
edo at edwaa dot com
18-Nov-2005 12:48
A version of the xml entities function below. This one replaces the "prime" character (?) with which I had difficulties.
// XML Entity Mandatory Escape Characters
function xmlentities($string) {
return str_replace ( array ( '&', '"', "'", '<', '>', '?' ), array ( '&' , '"', ''' , '<' , '>', ''' ), $string );
}
info at bleed dot ws
15-Oct-2005 01:42
here the centralized version of htmlentities() for multibyte.
Greatings ;-)
...
webmaster at swirldrop dot com
27-Jul-2005 02:45
To replace any characters in a string that could be 'dangerous' to put in an HTML/XML file with their numeric entities (e.g. é for 钐e acute]), you can use the following function:
function htmlnumericentities($str){
return preg_replace('/[^!-%\x27-;=?-~ ]/e', '"&#".ord("$0").chr(59)', $str);
};//EoFn htmlnumericentities