A summary of data about the Ruby ecosystem.

https://github.com/sparklemotion/nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://github.com/sparklemotion/nokogiri

Keywords

libxml2 libxslt nokogiri ruby ruby-gem sax xerces xml xslt

Keywords from Contributors

activerecord activejob mvc rubygems rack json-parser sinatra rspec multithreading rubocop

Last synced: about 13 hours ago
JSON representation

Repository metadata

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.

README.md

Nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2, libgumbo, and xerces.

Guiding Principles

Some guiding principles Nokogiri tries to follow:

  • be secure-by-default by treating all documents as untrusted by default
  • be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers

Features Overview

  • DOM Parser for XML, HTML4, and HTML5
  • SAX Parser for XML and HTML4
  • Push Parser for XML and HTML4
  • Document search via XPath 1.0
  • Document search via CSS3 selectors, with some jquery-like extensions
  • XSD Schema validation
  • XSLT transformation
  • "Builder" DSL for XML and HTML documents

Status

Github Actions CI
Appveyor CI

Gem Version
SemVer compatibility

CII Best Practices
Tidelift dependencies

Support, Getting Help, and Reporting Issues

All official documentation is posted at https://nokogiri.org (the source for which is at https://github.com/sparklemotion/nokogiri.org/, and we welcome contributions).

Reading

Your first stops for learning more about Nokogiri should be:

Ask For Help

There are a few ways to ask exploratory questions:

Please do not mail the maintainers at their personal addresses.

Report A Bug

The Nokogiri bug tracker is at https://github.com/sparklemotion/nokogiri/issues

Please use the "Bug Report" or "Installation Difficulties" templates.

Security and Vulnerability Reporting

Please report vulnerabilities at https://hackerone.com/nokogiri

See SECURITY.md for full information and description of our security policy.

Semantic Versioning Policy

Nokogiri follows Semantic Versioning (since 2017 or so). Dependabot's SemVer compatibility score for Nokogiri

We bump Major.Minor.Patch versions following this guidance:

Major: (we've never done this)

  • Significant backwards-incompatible changes to the public API that would require rewriting existing application code.

Minor:

  • Features and bugfixes.
  • Updating packaged libraries for non-security-related reasons.
  • Dropping support for EOLed Ruby versions. Some folks find this objectionable, but SemVer says this is OK if the public API hasn't changed.
  • Backwards-incompatible changes to internal or private methods and constants. These are detailed in the "Changes" section of each changelog entry.
  • Removal of deprecated methods or parameters, after a generous transition period; usually when those methods or parameters are rarely-used or dangerous to the user. Essentially, removals that do not justify a major version bump.

Patch:

  • Bugfixes.
  • Security updates.
  • Updating packaged libraries for security-related reasons.

Sponsorship

You can help sponsor the maintainers of this software through one of these organizations:

Installation

Requirements:

  • Ruby >= 3.1
  • JRuby >= 9.4.0.0

If you are compiling the native extension against a system version of libxml2:

  • libxml2 >= 2.9.2 (recommended >= 2.12.0)

Native Gems: Faster, more reliable installation

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries, or for system dependencies to exist. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

Supported Platforms

Nokogiri ships pre-compiled, "native" gems for the following platforms:

  • Linux:
    • x86_64-linux-gnu, aarch64-linux-gnu, and arm-linux-gnu (req: glibc >= 2.29)
    • x86_64-linux-musl, aarch64-linux-musl, and arm-linux-musl
  • Darwin/MacOS: x86_64-darwin and arm64-darwin
  • Windows: x64-mingw-ucrt
  • Java: any platform running JRuby 9.4 or higher

To determine whether your system supports one of these gems, look at the output of bundle platform or ruby -e 'puts Gem::Platform.local.to_s'.

If you're on a supported platform, either gem install or bundle install should install a native gem without any additional action on your part. This installation should only take a few seconds, and your output should look something like:

$ gem install nokogiri
Fetching nokogiri-1.11.0-x86_64-linux.gem
Successfully installed nokogiri-1.11.0-x86_64-linux
1 gem installed

Other Installation Options

Because Nokogiri is a C extension, it requires that you have a C compiler toolchain, Ruby development header files, and some system dependencies installed.

The following may work for you if you have an appropriately-configured system:

gem install nokogiri

If you have any issues, please visit Installing Nokogiri for more complete instructions and troubleshooting.

How To Use Nokogiri

Nokogiri is a large library, and so it's challenging to briefly summarize it. We've tried to provide long, real-world examples at Tutorials.

Parsing and Querying

Here is example usage for parsing and querying a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(URI.open('https://nokogiri.org/tutorials/installing_nokogiri.html'))

# Search for nodes by css
doc.css('nav ul.menu li a', 'article h2').each do |link|
  puts link.content
end

# Search for nodes by xpath
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content
end

# Or mix and match
doc.search('nav ul.menu li a', '//article//h2').each do |link|
  puts link.content
end

Encoding

Strings are always stored as UTF-8 internally. Methods that return
text values will always return UTF-8 encoded strings. Methods that
return a string containing markup (like to_xml, to_html and
inner_html) will return a string encoded like the source document.

WARNING

Some documents declare one encoding, but actually use a different
one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any
particular set of bytes could be valid characters in multiple
encodings, so detecting encoding with 100% accuracy is not
possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your
best bet is to explicitly set the encoding. Here is an example of
explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')

Technical Overview

Guiding Principles

As noted above, two guiding principles of the software are:

  • be secure-by-default by treating all documents as untrusted by default
  • be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers

Notably, despite all parsers being standards-compliant, there are behavioral inconsistencies between the parsers used in the CRuby and JRuby implementations, and Nokogiri does not and should not attempt to remove these inconsistencies. Instead, we surface these differences in the test suite when they are important/semantic; or we intentionally write tests to depend only on the important/semantic bits (omitting whitespace from regex matchers on results, for example).

CRuby

The Ruby (a.k.a., CRuby, MRI, YARV) implementation is a C extension that depends on libxml2 and libxslt (which in turn depend on zlib and possibly libiconv).

These dependencies are met by default by Nokogiri's packaged versions of the libxml2 and libxslt source code, but a configuration option --use-system-libraries is provided to allow specification of alternative library locations. See Installing Nokogiri for full documentation.

We provide native gems by pre-compiling libxml2 and libxslt (and potentially zlib and libiconv) and packaging them into the gem file. In this case, no compilation is necessary at installation time, which leads to faster and more reliable installation.

See LICENSE-DEPENDENCIES.md for more information on which dependencies are provided in which native and source gems.

JRuby

The Java (a.k.a. JRuby) implementation is a Java extension that depends primarily on Xerces and NekoHTML for parsing, though additional dependencies are on isorelax, nekodtd, jing, serializer, xalan-j, and xml-apis.

These dependencies are provided by pre-compiled jar files packaged in the java platform gem.

See LICENSE-DEPENDENCIES.md
for more information on which dependencies are provided in which native and source gems.

Contributing

See CONTRIBUTING.md for an intro guide to developing Nokogiri.

Code of Conduct

See the CODE_OF_CONDUCT.md.

License

This project is licensed under the terms of the MIT license.

See LICENSE.md.

Dependencies

Some additional libraries may be distributed with your version of Nokogiri.
See LICENSE-DEPENDENCIES.md for a discussion of the variations as well as the licenses thereof.

Authors

  • Mike Dalessio
  • Aaron Patterson
  • Yoko Harada
  • Akinori MUSHA
  • John Shahid
  • Karol Bucek
  • Sam Ruby
  • Craig Barnes
  • Stephen Checkoway
  • Lars Kanis
  • Sergio Arbeo
  • Timothy Elliott
  • Nobuyoshi Nakada

Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 10 days ago

Total Commits: 6,634
Total Committers: 271
Avg Commits per committer: 24.48
Development Distribution Score (DDS): 0.516

Commits in past year: 209
Committers in past year: 17
Avg Commits per committer in past year: 12.294
Development Distribution Score (DDS) in past year: 0.34

Name Email Commits
Mike Dalessio m****o@g****m 3214
Aaron Patterson a****n@g****m 1143
Yoko Harada y****t@g****m 266
Akinori MUSHA k****u@i****g 210
dependabot[bot] 4****] 180
Stephen Checkoway s@p****g 164
John Shahid j****d@g****m 158
kares s****f@k****g 151
Craig Barnes Cr@i****s 139
Sam Ruby r****s@i****t 120
Lars Kanis l****s@g****e 99
Serabe s****e@g****m 83
Sergio Arbeo s****o@Y****) 56
Timothy Elliott t****e@h****m 39
Nobuyoshi Nakada n****u@r****g 25
Patrick Mahoney p****t@p****g 24
BurdetteLamar b****r@y****m 17
Lee Jarvis l****s@g****m 13
Charles Oliver Nutter h****s@h****m 12
Michael Klein m****n@g****m 12
Charles Nutter h****s@c****l 11
Thomas Walpole t****e@g****m 10
Ben Langfeld b****n@l****e 10
fuzzy-boiii23a f****a@g****m 9
Rafael Masson r****n@g****m 9
ujihisa u****a@g****m 8
Étienne Barrié e****e@g****m 8
Jeff Hodges j****f@s****m 8
John Barnette j****e@g****m 7
Toshi MARUYAMA m****2@y****p 7
and 241 more...

Committer domains:


Issue and Pull Request metadata

Last synced: 5 days ago

Total issues: 286
Total pull requests: 806
Average time to close issues: almost 2 years
Average time to close pull requests: 26 days
Total issue authors: 164
Total pull request authors: 55
Average comments per issue: 3.49
Average comments per pull request: 0.97
Merged pull request: 605
Bot issues: 1
Bot pull requests: 223

Past year issues: 59
Past year pull requests: 256
Past year average time to close issues: 11 days
Past year average time to close pull requests: 4 days
Past year issue authors: 38
Past year pull request authors: 16
Past year average comments per issue: 1.76
Past year average comments per pull request: 0.8
Past year merged pull request: 165
Past year bot issues: 1
Past year bot pull requests: 69

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/sparklemotion/nokogiri

Top Issue Authors

  • flavorjones (90)
  • stevecheckoway (5)
  • Ruiizgaby4 (5)
  • forthrin (5)
  • utilitpy (4)
  • johnnyshields (4)
  • BurdetteLamar (4)
  • Mange (2)
  • stanhu (2)
  • postmodern (2)
  • rcrews (2)
  • doriantaylor (2)
  • searls (2)
  • x-yuri (2)
  • JJLeo (2)

Top Pull Request Authors

  • flavorjones (422)
  • dependabot[bot] (223)
  • BurdetteLamar (37)
  • stevecheckoway (15)
  • infews (9)
  • openbl (8)
  • MattJones (6)
  • etiennebarrie (6)
  • maths22 (4)
  • yokolet (3)
  • fuzzy-boiii23a (3)
  • headius (3)
  • step-security-bot (3)
  • peterzhu2118 (3)
  • mononoken (2)

Top Issue Labels

  • state/needs-triage (44)
  • platform/jruby (32)
  • topic/installation (26)
  • help wanted (20)
  • topic/memory (19)
  • meta/user-help (19)
  • meta/spam (12)
  • topic/namespaces (11)
  • needs/research (10)
  • upstream/libxml2 (10)
  • topic/performance (9)
  • topic/error-handling (9)
  • state/will-close (8)
  • topic/gumbo (8)
  • topic/ci (7)
  • meta/feature-request (7)
  • topic/encoding (7)
  • topic/HTML5 (7)
  • needs/more-info (6)
  • topic/css (6)
  • topic/entities (5)
  • topic/xsd (5)
  • vendored/libxml2 (4)
  • topic/documentation (4)
  • meta/discussion (3)
  • vendored/nekohtml (2)
  • topic/builder (2)
  • packaging/native-gem (2)
  • topic/fragment (2)
  • topic/security (2)

Top Pull Request Labels

  • dependencies (226)
  • ruby (167)
  • event/hackday2024 (26)
  • backport (25)
  • platform/jruby (25)
  • vendored/libxml2 (24)
  • github_actions (21)
  • topic/ci (20)
  • topic/memory (12)
  • topic/performance (11)
  • topic/namespaces (9)
  • upstream/libxml2 (9)
  • topic/security (9)
  • topic/entities (6)
  • topic/HTML5 (4)
  • meta/spam (3)
  • vendored/zlib (3)
  • state/pr-under-review (3)
  • hackday (2)
  • vendored/iconv (2)
  • topic/builder (2)
  • topic/xsd (2)
  • needs/research (2)
  • meta/feature-request (2)
  • topic/sax (2)
  • topic/installation (2)
  • vendored/libxslt (2)
  • topic/css (2)
  • topic/documentation (2)
  • topic/gumbo (2)

Package metadata

gem.coop: nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2, libgumbo, or xerces.

rubygems.org: nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2, libgumbo, or xerces.

conda-forge.org: rb-nokogiri

  • Homepage: https://rubygems.org/gems/nokogiri
  • Licenses: MIT
  • Latest release: 1.10.10 (published over 4 years ago)
  • Last Synced: 2025-12-05T23:08:29.402Z (5 days ago)
  • Versions: 3
  • Dependent Packages: 3
  • Dependent Repositories: 3
  • Rankings:
    • Stargazers count: 4.432%
    • Forks count: 5.046%
    • Average: 10.747%
    • Dependent packages count: 15.618%
    • Dependent repos count: 17.894%
gem.coop: Nokogiri_precompiled_aarch64_dedshit

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2, libgumbo, or xerces.

  • Homepage: https://nokogiri.org
  • Documentation: http://www.rubydoc.info/gems/Nokogiri_precompiled_aarch64_dedshit/
  • Licenses: MIT
  • Latest release: 1.14.5 (published about 2 years ago)
  • Last Synced: 2025-12-06T15:03:39.010Z (4 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 961 Total
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 32.38%
    • Downloads: 97.139%
  • Maintainers (1)
gem.coop: nokogiri-backport

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java).

  • Homepage: https://nokogiri.org
  • Documentation: http://www.rubydoc.info/gems/nokogiri-backport/
  • Licenses: MIT
  • Latest release: 1.11.0 (published almost 2 years ago)
  • Last Synced: 2025-12-06T15:04:10.670Z (4 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 773 Total
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 32.717%
    • Downloads: 98.151%
  • Maintainers (1)
rubygems.org: nokogiri-backport

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java).

  • Homepage: https://nokogiri.org
  • Documentation: http://www.rubydoc.info/gems/nokogiri-backport/
  • Licenses: MIT
  • Latest release: 1.11.0 (published almost 2 years ago)
  • Last Synced: 2025-12-06T15:04:33.471Z (4 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 773 Total
  • Rankings:
    • Dependent packages count: 15.712%
    • Dependent repos count: 48.743%
    • Average: 54.591%
    • Downloads: 99.32%
  • Maintainers (1)
rubygems.org: Nokogiri_precompiled_aarch64_dedshit

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2, libgumbo, or xerces.

  • Homepage: https://nokogiri.org
  • Documentation: http://www.rubydoc.info/gems/Nokogiri_precompiled_aarch64_dedshit/
  • Licenses: MIT
  • Latest release: 1.14.5 (published about 2 years ago)
  • Last Synced: 2025-12-06T15:03:38.978Z (4 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 961 Total
  • Rankings:
    • Dependent packages count: 15.777%
    • Dependent repos count: 48.976%
    • Average: 54.71%
    • Downloads: 99.378%
  • Maintainers (1)

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v4 composite
  • actions/checkout v5 composite
  • actions/download-artifact v5 composite
  • actions/upload-artifact v4 composite
  • cachix/install-nix-action v31 composite
  • ruby/setup-ruby v1 composite
  • ruby/setup-ruby-pkgs v1 composite
  • vmactions/freebsd-vm v1 composite
.github/workflows/downstream.yml actions
  • actions/cache v4 composite
  • actions/checkout v5 composite
.github/workflows/generate-ci-images.yml actions
  • actions/checkout v5 composite
  • docker/build-push-action v6 composite
  • docker/login-action v3 composite
  • docker/setup-buildx-action v3 composite
  • ruby/setup-ruby v1 composite
.github/workflows/upstream.yml actions
  • actions/cache v4 composite
  • actions/checkout v5 composite
  • actions/setup-java v5 composite
  • ruby/setup-ruby v1 composite
  • ruby/setup-ruby-pkgs v1 composite
Gemfile rubygems
  • bundler ~> 2.3 development
  • minitest = 5.25.5 development
  • minitest-parallel_fork = 2.1.0 development
  • rake = 13.3.0 development
  • rake-compiler = 1.3.0 development
  • rake-compiler-dock = 1.9.1 development
  • rexical = 1.0.8 development
  • rubocop-minitest = 0.38.2 development
  • rubocop-packaging = 0.6.0 development
  • rubocop-rake = 0.7.1 development
  • ruby_memcheck = 3.0.1 development
  • rubyzip ~> 3.1.0 development
  • simplecov = 0.22.0 development
  • standard = 1.50.0 development
nokogiri.gemspec rubygems
  • mini_portile2 ~> 2.8.2
  • racc ~> 1.4

Score: 36.66893557650037