Recent Releases of https://github.com/rgrove/sanitize
https://github.com/rgrove/sanitize -
Sanitize has no breaking API changes in this release, but the major version number has been incremented because we've dropped support for end-of-life versions of Ruby. As long as you're using Ruby 3.1.0 or later, this should be a painless upgrade!
Added
-
Added over 100 new CSS properties to the relaxed config, representing all properties that are listed with a status of "Working Draft" or better in the latest W3C "All Properties" list.
-
Added the
@containerCSS at-rule to the relaxed config. -
Added the
-webkit-text-fill-colorCSS property to the relaxed config. @radar - #244
Changed
-
Ruby 3.1.0 is now the oldest supported Ruby version.
-
Sanitize now requires Nokogiri 1.16.8 or higher.
- Ruby
Published by rgrove about 1 year ago
https://github.com/rgrove/sanitize -
Bug Fixes
- The CSS URL protocol allowlist is now enforced on the nonstandard
-webkit-image-setCSS function. @ltk - #242
- Ruby
Published by rgrove over 1 year ago
https://github.com/rgrove/sanitize -
Bug Fixes
- The CSS URL protocol allowlist is now properly enforced in CSS Images Module Level 4
imageandimage-setfunctions. @ltk - #240
- Ruby
Published by rgrove over 1 year ago
https://github.com/rgrove/sanitize - v6.1.1
Bug Fixes
- Proactively fixed a compatibility issue with libxml >= 2.13.0 (which will be used in an upcoming version of Nokogiri) that caused HTML doctype sanitization to fail. @flavorjones - #238
- Ruby
Published by rgrove over 1 year ago
https://github.com/rgrove/sanitize -
Features
- Added the
text-decoration-skip-inkandtext-decoration-thicknessCSS properties to the relaxed config. @martineriksson - #228
- Ruby
Published by rgrove over 2 years ago
https://github.com/rgrove/sanitize - v6.0.2
Bug Fixes
-
CVE-2023-36823: Fixed an HTML+CSS sanitization bypass that could allow XSS (cross-site scripting). This issue affects Sanitize versions 3.0.0 through 6.0.1.
When using Sanitize's relaxed config or a custom config that allows
<style>elements and one or more CSS at-rules, carefully crafted input could be used to sneak arbitrary HTML through Sanitize.See the following security advisory for additional details: GHSA-f5ww-cq3m-q3g7
Thanks to @cure53 for finding this issue.
- Ruby
Published by rgrove over 2 years ago
https://github.com/rgrove/sanitize -
Bug Fixes
-
Sanitize now always removes
<noscript>elements and their contents, even whennoscriptis in the allowlist.This fixes a sanitization bypass that could occur when
noscriptwas allowed by a custom allowlist. In this scenario, carefully crafted input could sneak arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site scripting) attack.Sanitize's default configs don't allow
<noscript>elements and are not vulnerable. This issue only affects users who are using a custom config that addsnoscriptto the element allowlist.The root cause of this issue is that HTML parsing rules treat the contents of a
<noscript>element differently depending on whether scripting is enabled in the user agent. Nokogiri doesn't support scripting so it follows the "scripting disabled" rules, but a web browser with scripting enabled will follow the "scripting enabled" rules. This means that Sanitize can't reliably make the contents of a<noscript>element safe for scripting enabled browsers, so the safest thing to do is to remove the element and its contents entirely.See the following security advisory for additional details: GHSA-fw3g-2h3j-qmm7
Thanks to David Klein from TU Braunschweig (@leeN) for reporting this issue.
-
Fixed an edge case in which the contents of an "unescaped text" element (such as
<noembed>or<xmp>) were not properly escaped if that element was allowlisted and was also inside an allowlisted<math>or<svg>element.The only way to encounter this situation was to ignore multiple warnings in the readme and create a custom config that allowlisted all the elements involved, including
<math>or<svg>. If you're using a default config or if you heeded the warnings about MathML and SVG not being supported, you're not affected by this issue.Please let this be a reminder that Sanitize cannot safely sanitize MathML or SVG content and does not support this use case. The default configs don't allow MathML or SVG elements, and allowlisting MathML or SVG elements in a custom config may create a security vulnerability in your application.
Documentation has been updated to add more warnings and to make the existing warnings about this more prominent.
Thanks to David Klein from TU Braunschweig (@leeN) for reporting this issue.
- Ruby
Published by rgrove almost 3 years ago
https://github.com/rgrove/sanitize -
Potentially Breaking Changes
-
Ruby 2.5.0 is now the oldest officially supported Ruby version.
-
Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo. The separate dependency on Nokogumbo has been removed. @lis2 - #211
- Ruby
Published by rgrove over 4 years ago
https://github.com/rgrove/sanitize -
Bug Fixes
- Ensure protocol sanitization is applied to data attributes. @ccutrer - #207
- Ruby
Published by rgrove almost 5 years ago
https://github.com/rgrove/sanitize -
Bug Fixes
- Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a custom transformer. @mscrivo - #206
- Ruby
Published by rgrove about 5 years ago
https://github.com/rgrove/sanitize -
Bug Fixes
-
Fixed an HTML sanitization bypass that could allow XSS. This issue affects Sanitize versions 3.0.0 through 5.2.0.
When HTML was sanitized using the "relaxed" config or a custom config that allows certain elements, some content in a
<math>or<svg>element may not have beeen sanitized correctly even ifmathandsvgwere not in the allowlist. This could allow carefully crafted input to sneak arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site scripting) attack.You are likely to be vulnerable to this issue if you use Sanitize's relaxed config or a custom config that allows one or more of the following HTML elements:
iframemathnoembednoframesnoscriptplaintextscriptstylesvgxmp
See the security advisory for more details, including a workaround if you're not able to upgrade: GHSA-p4x4-rw2p-8j8m
Many thanks to Michał Bentkowski of Securitum for reporting this issue and helping to verify the fix.
- Ruby
Published by rgrove over 5 years ago
https://github.com/rgrove/sanitize -
Changes
-
The term "whitelist" has been replaced with "allowlist" throughout Sanitize's source and documentation.
While the etymology of "whitelist" may not be explicitly racist in origin or intent, there are inherent racial connotations in the implication that white is good and black (as in "blacklist") is not.
This is a change I should have made long ago, and I apologize for not making it sooner.
-
In transformer input, the
:is_whitelistedand:node_whitelistkeys are now deprecated. New:is_allowlistedand:node_allowlistkeys have been added. The old keys will continue to work in order to avoid breaking existing code, but they are no longer documented and may be removed in a future semver major release.
- Ruby
Published by rgrove over 5 years ago
https://github.com/rgrove/sanitize -
Features
- Added a
:parser_optionsconfig hash, which makes it possible to pass custom parsing options to Nokogumbo. @austin-wang - #194
Bug Fixes
- Non-characters and non-whitespace control characters are now stripped from HTML input before parsing to comply with the HTML Standard's preprocessing guidelines. Prior to this Sanitize had adhered to older W3C guidelines that have since been withdrawn. #179
- Ruby
Published by rgrove over 6 years ago
https://github.com/rgrove/sanitize -
For most users, upgrading from 4.x shouldn't require any changes. However, the minimum required Ruby version has changed, and Sanitize 5.x's HTML output may differ in some small ways from 4.x's output. If this matters to you, please review the changes below carefully.
Potentially Breaking Changes
-
Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely no longer works in Ruby 1.9.x.
-
Upgraded to Nokogumbo 2.x, which fixes various bugs and adds standard-compliant HTML serialization. @stevecheckoway - #189
-
Children of the following elements are now removed by default when these elements are removed, rather than being preserved and escaped:
iframenoembednoframesnoscriptscriptstyle
-
Children of whitelisted
iframeelements are now always removed. In modern HTML,iframeelements should never have children. In HTML 4 and earlieriframeelements were allowed to contain fallback content for legacy browsers, but it's been almost two decades since that was useful. -
Fixed a bug that caused
:remove_contentsto behave as if it were set totruewhen it was actually an Array.
- Ruby
Published by rgrove about 7 years ago
https://github.com/rgrove/sanitize -
-
CVE-2018-3740: Fixed an HTML injection vulnerability that could allow XSS (backported from Sanitize 4.6.3). @dometto - #188
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a specially crafted HTML fragment can cause libxml2 to generate improperly escaped output, allowing non-whitelisted attributes to be used on whitelisted elements.
Sanitize now performs additional escaping on affected attributes to prevent this.
Many thanks to the Shopify Application Security Team for responsibly reporting this issue.
- Ruby
Published by rgrove over 7 years ago
https://github.com/rgrove/sanitize -
- Improved performance and memory usage by optimizing
Sanitize#transform_node!@stanhu - #183
- Ruby
Published by rgrove over 7 years ago
https://github.com/rgrove/sanitize -
- Improved performance slightly by tweaking the order of built-in transformers. @rafbm - #180
- Ruby
Published by rgrove over 7 years ago
https://github.com/rgrove/sanitize - 4.6.4 (2018-03-20)
- Fixed: A change introduced in 4.6.2 broke certain transformers that relied on being able to mutate the name of an HTML node. That change has been reverted and a test has been added to cover this case. @zetter - #177
- Ruby
Published by rgrove almost 8 years ago
https://github.com/rgrove/sanitize - 4.6.3 (2018-03-19)
-
CVE-2018-3740: Fixed an HTML injection vulnerability that could allow XSS.
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a specially crafted HTML fragment can cause libxml2 to generate improperly escaped output, allowing non-whitelisted attributes to be used on whitelisted elements.
Sanitize now performs additional escaping on affected attributes to prevent this.
Many thanks to the Shopify Application Security Team for responsibly reporting this issue.
- Ruby
Published by rgrove almost 8 years ago
https://github.com/rgrove/sanitize - 4.6.2 (2018-03-19)
- Reduced string allocations to optimize memory usage. @janklimo - #175
- Ruby
Published by rgrove almost 8 years ago
https://github.com/rgrove/sanitize - 4.6.1 (2018-03-15)
- Added support for frozen string literals in Ruby 2.4+. @flavorjones - #174
- Ruby
Published by rgrove almost 8 years ago
https://github.com/rgrove/sanitize - 4.6.0 (2018-01-29)
- Loosened the Nokogumbo dependency to allow installing semver-compatible versions greater than or equal to v1.4. @rafbm - #171
- Ruby
Published by rgrove almost 8 years ago
https://github.com/rgrove/sanitize - 4.5.0 (2017-06-04)
-
Added SVG-related CSS properties to the relaxed config. See the diff for the full list of added properties. @louim - #161
-
Fixed: Sanitize now strips null bytes (
\u0000) before passing input to Nokogumbo, since they can cause recent versions to crash with a failed assertion in the Gumbo parser.
- Ruby
Published by rgrove over 8 years ago
https://github.com/rgrove/sanitize - 4.4.0 (2016-09-29)
- Added
srcsetto the attribute whitelist forimgelements in the relaxed config. @ejtttje - #156
- Ruby
Published by rgrove over 9 years ago
https://github.com/rgrove/sanitize - 4.3.0 (2016-09-20)
- Methods can now be used as transformers. @Skipants - #155
- Ruby
Published by rgrove over 9 years ago
https://github.com/rgrove/sanitize - 4.2.0 (2016-08-22)
- Added
-webkit-font-smoothingto the relaxed CSS config. @louim - #154 - Fixed: Nokogumbo >=1.4.9 changed its behavior in a way that allowed invalid doctypes (like
<!DOCTYPE nonsense>) when the:allow_doctypeconfig setting wastrue. Invalid doctypes are now coerced to valid ones as they were prior to this Nokogumbo change.
- Ruby
Published by rgrove over 9 years ago
https://github.com/rgrove/sanitize - 4.1.0 (2016-06-17)
- Added a new CSS config setting,
:import_url_validator. This is a Proc or
other callable object that will be called with each@importURL, and should
returntrueto allow the URL orfalseto remove it. @nikz - #153
- Ruby
Published by rgrove over 9 years ago
https://github.com/rgrove/sanitize - 4.0.1 (2015-12-09)
- Unpinned the Nokogumbo dependency. @rubys - #141
- Ruby
Published by rgrove about 10 years ago
https://github.com/rgrove/sanitize - 4.0.0 (2015-04-20)
Potentially breaking changes
-
Added two new CSS config settings,
:at_rules_with_propertiesand:at_rules_with_styles. These allow you to define which at-rules should be allowed to contain properties and which should be allowed to contain style rules. Previously this was hard-coded internally. #111The previous
:at_rulessetting still exists, and defines at-rules that may not have associated blocks, such as@import. If you have a custom config that contains an:at_rulessetting, you may need to move rules can have blocks to either:at_rules_with_propertiesor:at_rules_with_styles.See Sanitize's relaxed config for an example.
Other changes
- Added full support for CSS
@pagerules in the relaxed config, including support for all page-margin box rules (such as@top-left,@bottom-center, etc.) - Added the following CSS at-rules to the relaxed config:
@-moz-keyframes@-o-keyframes@-webkit-keyframes@document
- Added a whole bunch of CSS properties to the relaxed config. View the complete list here.
- Small performance improvements.
- Fixed: Upgraded Crass to 1.0.2 to pick up a fix that affected the parsing of CSS
@pagerules.
- Ruby
Published by rgrove over 10 years ago
https://github.com/rgrove/sanitize - Version 3.1.2 (2015-02-22)
- Fixed: Deleting a node in a custom transformer could trigger a memory leak in Nokogiri if that node's children were later reparented, which the built-in CleanElement transformer did by default. The CleanElement transformer is now careful not to reparent the children of deleted nodes. #129
- Ruby
Published by rgrove almost 11 years ago
https://github.com/rgrove/sanitize - Version 3.1.1 (2015-02-04)
- Fixed:
#documentand#fragmentfailed on frozen strings, and could unintentionally modify unfrozen strings if they used an encoding other than UTF-8 or if they contained characters not allowed in HTML. @AnchorCat - #128
- Ruby
Published by rgrove almost 11 years ago
https://github.com/rgrove/sanitize - Version 3.1.0 (2014-12-22)
- Added the following CSS properties to the relaxed config. @ehudc - #120
-moz-text-size-adjust-ms-text-size-adjust-webkit-text-size-adjusttext-size-adjust
- Updated Nokogumbo to 1.2.0 to pick up a fix for a Gumbo bug where the entity
Æleft its semicolon behind when it was converted to a character during parsing. #119
- Ruby
Published by rgrove about 11 years ago
https://github.com/rgrove/sanitize - Version 3.0.4 (2014-12-12)
- Fixed: Harmless whitespace preceding a URL protocol (such as " http://") caused the URL to be removed even when the protocol was whitelisted. @benubois - #126
- Ruby
Published by rgrove about 11 years ago
https://github.com/rgrove/sanitize - Version 3.0.3 (2014-10-29)
- Fixed: Some CSS selectors weren't parsed correctly inside the body of a
@mediablock, causing them to be removed even when whitelist rules should have allowed them to remain. #121
- Ruby
Published by rgrove about 11 years ago
https://github.com/rgrove/sanitize - Version 3.0.2 (2014-09-02)
- Updated Nokogumbo to 1.1.12, because 1.1.11 silently reverted the change we were trying to pick up in the last release. Now issue #114 is actually fixed.
- Ruby
Published by rgrove over 11 years ago
https://github.com/rgrove/sanitize - Version 3.0.1 (2014-09-02)
- Updated Nokogumbo to 1.1.11 to pick up a fix for a Gumbo bug in which certain HTML character entities, such as
Ö, were parsed incorrectly, leaving the semicolon behind in the output. #114
- Ruby
Published by rgrove over 11 years ago
https://github.com/rgrove/sanitize - Version 3.0.0 (2014-06-21)
As of this version, Sanitize adheres strictly to the SemVer 2.0.0 versioning standard. This release contains API and output changes that are incompatible with previous releases, as indicated by the major version increment.
Backwards-incompatible changes
- HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the HTML5 parsing spec and behaves much more like modern browser parsers than the previous libxml2-based parser. As a result, HTML output may differ from that of previous versions of Sanitize.
- All transformers now traverse the document from the top down, starting with the first node, then its first child, and so on. The
:transformers_breadthconfig has been removed, and old bottom-up transformers (the previous default) may need to be rewritten. - Sanitize's built-in configs are now deeply frozen to prevent people from modifying them (either accidentally or maliciously). To customize a built-in config, create a new copy using
Sanitize::Config.merge(), like so:
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
:remove_contents => true
))
- The
clean!andclean_document!methods were removed, since they weren't useful and tended to confuse people. - The
cleanmethod was renamed tofragmentto more clearly indicate that its intended use is to sanitize an HTML fragment. - The
clean_documentmethod was renamed todocument. - The
clean_node!method was renamed tonode!. - The
documentmethod now raises aSanitize::Errorif the<html>element isn't whitelisted, rather than aRuntimeError. This error is also now raised regardless of the:remove_contentsconfig setting. - The
:outputconfig has been removed. Output is now always HTML, not XHTML. - The
:output_encodingconfig has been removed. Output is now always UTF-8.
Other changes
- Added advanced CSS sanitization support using Crass, which is fully compliant with the CSS Syntax Module Level 3 parsing spec. The contents of whitelisted
<style>elements andstyleattributes in HTML will be sanitized as CSS, or you can use theSanitize::CSSclass to manually sanitize CSS stylesheets or properties. - Added an
:allow_doctypesetting. Whentrue, well-formed doctype definitions will be allowed in documents. Whenfalse(the default), doctype definitions will be removed from documents. Doctype definitions are never allowed in fragments, regardless of this setting. - Added the following elements to the relaxed config, in addition to various attributes:
article,aside,body,data,div,footer,head,header,html,main,nav,section,span,style,title. - The
:whitespace_elementsconfig is now a Hash, and allows you to specify the text that should be inserted before and after these elements when they're removed. The old-style Array-based config value is still supported for backwards compatibility. @alperkokmen - #94 - Unsuitable Unicode characters are now removed from HTML before it's parsed. #106
- Fixed: Non-tag brackets in input like
"1 > 2 and 2 < 1"are now parsed and escaped correctly in accordance with the HTML5 spec, becoming"1 > 2 and 2 < 1". #83 - Fixed: Siblings added after the current node during traversal are now also traversed. In previous versions they were simply skipped. #91
- Fixed: Nokogiri has been smacked and instructed to stop adding newlines after certain elements, because if people wanted newlines there they'd have put them there, dammit. #103
- Fixed: Added a workaround for a libxml2 bug that caused an undesired content-type meta tag to be added to all documents with
<head>elements. Nokogiri #1008
- Ruby
Published by rgrove over 11 years ago