A summary of data about the Ruby ecosystem.

https://github.com/brianmario/charlock_holmes

Character encoding detection, brought to you by ICU
https://github.com/brianmario/charlock_holmes

Keywords from Contributors

rubygems activerecord mvc activejob rack minitest crash-reporting ruby-gem background-jobs sinatra

Last synced: about 21 hours ago
JSON representation

Repository metadata

Character encoding detection, brought to you by ICU

README.md

CharlockHolmes

Character encoding detecting library for Ruby using ICU

Usage

First you'll need to require it

require 'charlock_holmes'

Encoding detection

contents = File.read('test.xml')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}

# optionally there will be a :language key as well, but
# that's mostly only returned for legacy encodings like ISO-8859-1

NOTE: CharlockHolmes::EncodingDetector.detect will return nil if it was unable to find an encoding.

For binary content, :type will be set to :binary

Though it's more efficient to reuse once detector instance:

detector = CharlockHolmes::EncodingDetector.new

detection1 = detector.detect(File.read('test.xml'))
detection2 = detector.detect(File.read('test2.json'))

# and so on...

String monkey patch

Alternatively, you can just use the detect_encoding method on the String class

require 'charlock_holmes/string'

contents = File.read('test.xml')

detection = contents.detect_encoding

Ruby 1.9 specific

NOTE: This method only exists on Ruby 1.9+

If you want to use this library to detect and set the encoding flag on strings, you can use the detect_encoding! method on the String class

require 'charlock_holmes/string'

contents = File.read('test.xml')

# this will detect and set the encoding of `contents`, then return self
contents.detect_encoding!

Transcoding

Being able to detect the encoding of some arbitrary content is nice, but what you probably want is to be able to transcode that content into an encoding your application is using.

content = File.read('test2.txt')
detection = CharlockHolmes::EncodingDetector.detect(content)
utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'

The first parameter is the content to transcode, the second is the source encoding (the encoding the content is assumed to be in), and the third parameter is the destination encoding.

Installing

If the traditional gem install charlock_holmes doesn't work, you may need to specify the path to
your installation of ICU using the --with-icu-dir option during the gem install or by configuring Bundler to
pass those arguments to Gem:

Configure Bundler to always use the correct arguments when installing:

bundle config build.charlock_holmes --with-icu-dir=/path/to/installed/icu4c

Using Gem to install directly without Bundler:

gem install charlock_holmes -- --with-icu-dir=/path/to/installed/icu4c

If you get a compile time error that looks like error: delegating constructors are permitted only in C++11 or something else related to C++11, you need to set the --with-cxxflags=-std=c++11 options

Bundler:

bundle config build.charlock_holmes --with-icu-dir=/path/to/installed/icu4c --with-cxxflags=-std=c++11

Installing directly:

gem install charlock_holmes -- --with-icu-dir=/path/to/installed/icu4c --with-cxxflags=-std=c++11

Homebrew

If you're installing on Mac OS X then using Homebrew is
the easiest way to install ICU.

However, be warned; it is a Keg-Only (see homedir issue #167
for more info) install meaning RubyGems won't find it when installing without specifying --with-icu-dir

To install ICU with Homebrew:

brew install icu4c

Configure Bundler to always use the correct arguments when installing:

bundle config build.charlock_holmes --with-icu-dir=/usr/local/opt/icu4c

Using Gem to install directly without Bundler:

gem install charlock_holmes -- --with-icu-dir=/usr/local/opt/icu4c

Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 7 days ago

Total Commits: 212
Total Committers: 25
Avg Commits per committer: 8.48
Development Distribution Score (DDS): 0.259

Commits in past year: 0
Committers in past year: 0
Avg Commits per committer in past year: 0.0
Development Distribution Score (DDS) in past year: 0.0

Name Email Commits
Brian Lopez s****z@g****m 157
Aaron Patterson t****e@r****g 9
Aman Gupta a****n@t****t 6
Stan Hu s****u@g****m 6
David Graham d****m@g****m 4
Misty De Meo m****o@g****m 4
Max Veytsman m****n@g****m 3
Ken Dreyer k****r@k****m 3
Joshua Peek j****h@j****m 2
Stephan van Eijkelenburg s****e@g****m 2
grosser m****l@g****t 2
Alexey Lapitsky a****y@s****m 1
Andrew Daugherity a****y@g****m 1
Benoit Bénézech b****h@g****m 1
Christian Höltje d****t@g****g 1
Grey Baker g****l@g****m 1
Igor Victor g****a@y****u 1
Mike Połtyn m****e@p****m 1
Nicolas Leger n****r 1
Olle Jonsson o****n@g****m 1
Raphael Nestler r****r@r****h 1
Vicent Marti t****u@g****m 1
Scott J. Goldman s****g@g****m 1
Nicolas Leger n****r@n****m 1
mhasbini m****i@g****m 1

Committer domains:


Issue and Pull Request metadata

Last synced: 7 days ago

Total issues: 74
Total pull requests: 67
Average time to close issues: 10 months
Average time to close pull requests: about 1 year
Total issue authors: 67
Total pull request authors: 32
Average comments per issue: 3.85
Average comments per pull request: 1.54
Merged pull request: 29
Bot issues: 0
Bot pull requests: 0

Past year issues: 4
Past year pull requests: 6
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 4
Past year pull request authors: 4
Past year average comments per issue: 0.0
Past year average comments per pull request: 1.17
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/brianmario/charlock_holmes

Top Issue Authors

  • IsraelBuitronD (2)
  • brauliobo (2)
  • auguszou (2)
  • jeremiahsherrill (2)
  • edinoteK (2)
  • OneDivZero (2)
  • mcandre (2)
  • ikappas (1)
  • art-solopov (1)
  • dustinsgoodman (1)
  • LukeShu (1)
  • nathany (1)
  • Startouf (1)
  • badbye (1)
  • wpostma (1)

Top Pull Request Authors

  • stanhu (9)
  • tenderlove (7)
  • mistydemeo (4)
  • waghanza (4)
  • brianmario (3)
  • josh (3)
  • jhawthorn (3)
  • stephenbinns (2)
  • T19sk (2)
  • rnestler (2)
  • gogainda (2)
  • mhasbini (2)
  • dgraham (2)
  • nijikon (2)
  • nicolasleger (2)

Top Issue Labels

Top Pull Request Labels


Package metadata

gem.coop: charlock_holmes

charlock_holmes provides binary and text detection as well as text transcoding using libicu

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Documentation: http://www.rubydoc.info/gems/charlock_holmes/
  • Licenses: MIT
  • Latest release: 0.7.9 (published over 1 year ago)
  • Last Synced: 2025-12-10T05:31:33.602Z (3 days ago)
  • Versions: 28
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 62,623,530 Total
  • Docker Downloads: 1,431,014,912
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 0.146%
    • Downloads: 0.437%
  • Maintainers (2)
rubygems.org: charlock_holmes

charlock_holmes provides binary and text detection as well as text transcoding using libicu

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Documentation: http://www.rubydoc.info/gems/charlock_holmes/
  • Licenses: MIT
  • Latest release: 0.7.9 (published over 1 year ago)
  • Last Synced: 2025-12-10T20:00:33.092Z (2 days ago)
  • Versions: 28
  • Dependent Packages: 54
  • Dependent Repositories: 3,708
  • Downloads: 62,637,255 Total
  • Docker Downloads: 1,431,014,912
  • Rankings:
    • Docker downloads count: 0.241%
    • Downloads: 0.43%
    • Dependent packages count: 0.495%
    • Dependent repos count: 0.516%
    • Average: 0.959%
    • Stargazers count: 1.893%
    • Forks count: 2.179%
  • Maintainers (2)
alpine-v3.18: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r13 (published over 2 years ago)
  • Last Synced: 2025-12-10T20:00:58.093Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 4.374%
    • Stargazers count: 7.98%
    • Forks count: 9.516%
  • Maintainers (1)
gem.coop: charlock_holmes_bundle_icu

Character encoding detection, brought to you by ICU

  • Homepage: http://github.com/brianmario/charlock_holmes
  • Documentation: http://www.rubydoc.info/gems/charlock_holmes_bundle_icu/
  • Licenses: mit
  • Latest release: 0.6.9.2 (published almost 13 years ago)
  • Last Synced: 2025-12-10T20:00:36.415Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 39,365 Total
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 5.801%
    • Downloads: 17.404%
  • Maintainers (1)
rubygems.org: charlock_holmes_bundle_icu

Character encoding detection, brought to you by ICU

  • Homepage: http://github.com/brianmario/charlock_holmes
  • Documentation: http://www.rubydoc.info/gems/charlock_holmes_bundle_icu/
  • Licenses: mit
  • Latest release: 0.6.9.2 (published almost 13 years ago)
  • Last Synced: 2025-12-10T20:00:34.380Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 6
  • Dependent Repositories: 11
  • Downloads: 39,365 Total
  • Rankings:
    • Stargazers count: 1.874%
    • Forks count: 2.219%
    • Dependent packages count: 2.499%
    • Average: 5.867%
    • Dependent repos count: 6.908%
    • Downloads: 15.838%
  • Maintainers (1)
alpine-v3.13: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r5 (published over 5 years ago)
  • Last Synced: 2025-12-10T20:00:54.174Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 5.102%
    • Forks count: 6.613%
    • Average: 7.809%
    • Dependent packages count: 19.522%
  • Maintainers (1)
alpine-v3.12: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r5 (published over 5 years ago)
  • Last Synced: 2025-12-10T20:00:51.869Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 4.49%
    • Forks count: 5.641%
    • Average: 7.9%
    • Dependent packages count: 21.468%
  • Maintainers (1)
alpine-v3.9: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.6-r1 (published almost 7 years ago)
  • Last Synced: 2025-12-11T07:14:20.878Z (1 day ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 3.816%
    • Forks count: 4.934%
    • Average: 7.975%
    • Dependent packages count: 23.151%
  • Maintainers (1)
alpine-v3.11: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r0 (published about 6 years ago)
  • Last Synced: 2025-12-10T20:00:48.013Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 4.506%
    • Forks count: 5.632%
    • Average: 8.185%
    • Dependent packages count: 22.601%
  • Maintainers (1)
alpine-v3.8: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.6-r0 (published over 7 years ago)
  • Last Synced: 2025-12-10T20:00:31.590Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 3.544%
    • Forks count: 4.655%
    • Average: 8.356%
    • Dependent packages count: 25.225%
  • Maintainers (1)
alpine-v3.14: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r5 (published over 5 years ago)
  • Last Synced: 2025-12-10T20:00:55.835Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 5.201%
    • Forks count: 6.659%
    • Average: 8.385%
    • Dependent packages count: 21.681%
  • Maintainers (1)
alpine-edge: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.9-r2 (published 8 months ago)
  • Last Synced: 2025-12-10T20:01:15.715Z (2 days ago)
  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Average: 8.554%
    • Stargazers count: 9.15%
    • Forks count: 10.426%
    • Dependent packages count: 14.641%
  • Maintainers (1)
alpine-v3.15: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r7 (published about 4 years ago)
  • Last Synced: 2025-11-12T09:13:36.041Z (about 1 month ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 5.556%
    • Forks count: 6.962%
    • Average: 9.526%
    • Dependent packages count: 25.585%
  • Maintainers (1)
alpine-v3.10: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.6-r3 (published over 6 years ago)
  • Last Synced: 2025-12-10T20:00:48.876Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 4.004%
    • Forks count: 5.222%
    • Average: 9.695%
    • Dependent packages count: 29.555%
  • Maintainers (1)
alpine-v3.16: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r10 (published over 3 years ago)
  • Last Synced: 2025-12-10T20:00:52.318Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 5.993%
    • Forks count: 7.267%
    • Average: 10.143%
    • Dependent packages count: 27.311%
  • Maintainers (1)
alpine-v3.17: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r11 (published about 3 years ago)
  • Last Synced: 2025-12-10T20:00:52.658Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 7.533%
    • Forks count: 8.97%
    • Average: 10.939%
    • Dependent packages count: 27.254%
  • Maintainers (1)
spack.io: ruby-charlock-holmes

Character encoding detection, brought to you by ICU.

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: []
  • Latest release: 0.7.9 (published about 2 months ago)
  • Last Synced: 2025-12-10T20:01:10.160Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 7.951%
    • Forks count: 8.99%
    • Average: 17.588%
    • Dependent packages count: 53.411%
  • Maintainers (1)
gem.coop: charlock_holmes_heroku

Character encoding detection, brought to you by ICU

  • Homepage: http://github.com/brianmario/charlock_holmes
  • Documentation: http://www.rubydoc.info/gems/charlock_holmes_heroku/
  • Licenses: mit
  • Latest release: 0.6.13 (published about 12 years ago)
  • Last Synced: 2025-12-10T20:00:36.748Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 4,567 Total
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 23.99%
    • Downloads: 71.97%
  • Maintainers (1)
rubygems.org: charlock_holmes_heroku

Character encoding detection, brought to you by ICU

  • Homepage: http://github.com/brianmario/charlock_holmes
  • Documentation: http://www.rubydoc.info/gems/charlock_holmes_heroku/
  • Licenses: mit
  • Latest release: 0.6.13 (published about 12 years ago)
  • Last Synced: 2025-12-10T20:00:37.405Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 4,567 Total
  • Rankings:
    • Stargazers count: 1.729%
    • Forks count: 2.028%
    • Dependent packages count: 15.706%
    • Average: 27.691%
    • Dependent repos count: 46.782%
    • Downloads: 72.211%
  • Maintainers (1)
alpine-v3.21: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.9-r0 (published about 1 year ago)
  • Last Synced: 2025-12-10T20:00:57.821Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 100%
  • Maintainers (1)
alpine-v3.22: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.9-r2 (published 8 months ago)
  • Last Synced: 2025-12-10T20:01:05.778Z (2 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 100%
  • Maintainers (1)
alpine-v3.19: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r14 (published about 2 years ago)
  • Last Synced: 2025-11-16T12:21:34.924Z (26 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 100%
  • Maintainers (1)
alpine-v3.20: ruby-charlock_holmes

Character encoding detection, brought to you by ICU

  • Homepage: https://github.com/brianmario/charlock_holmes
  • Licenses: MIT
  • Latest release: 0.7.7-r15 (published almost 2 years ago)
  • Last Synced: 2025-11-15T11:52:20.152Z (27 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Dependent packages count: 0.0%
    • Average: 100%
  • Maintainers (1)

Dependencies

charlock_holmes.gemspec rubygems
  • chardet ~> 0.9 development
  • minitest ~> 5.11 development
  • rake-compiler ~> 1.0 development
Gemfile rubygems

Score: 32.06916364401828