https://github.com/brianmario/charlock_holmes
Character encoding detection, brought to you by ICU
https://github.com/brianmario/charlock_holmes
Keywords from Contributors
rubygems activerecord mvc activejob rack minitest crash-reporting ruby-gem background-jobs sinatra
Last synced: about 21 hours ago
JSON representation
Repository metadata
Character encoding detection, brought to you by ICU
- Host: GitHub
- URL: https://github.com/brianmario/charlock_holmes
- Owner: brianmario
- License: mit
- Created: 2011-08-23T22:53:03.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2024-07-11T04:26:42.000Z (over 1 year ago)
- Last Synced: 2025-12-11T16:21:59.623Z (1 day ago)
- Language: Ruby
- Homepage:
- Size: 81.3 MB
- Stars: 1,065
- Watchers: 21
- Forks: 150
- Open Issues: 68
- Releases: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
README.md
CharlockHolmes
Character encoding detecting library for Ruby using ICU
Usage
First you'll need to require it
require 'charlock_holmes'
Encoding detection
contents = File.read('test.xml')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}
# optionally there will be a :language key as well, but
# that's mostly only returned for legacy encodings like ISO-8859-1
NOTE: CharlockHolmes::EncodingDetector.detect will return nil if it was unable to find an encoding.
For binary content, :type will be set to :binary
Though it's more efficient to reuse once detector instance:
detector = CharlockHolmes::EncodingDetector.new
detection1 = detector.detect(File.read('test.xml'))
detection2 = detector.detect(File.read('test2.json'))
# and so on...
String monkey patch
Alternatively, you can just use the detect_encoding method on the String class
require 'charlock_holmes/string'
contents = File.read('test.xml')
detection = contents.detect_encoding
Ruby 1.9 specific
NOTE: This method only exists on Ruby 1.9+
If you want to use this library to detect and set the encoding flag on strings, you can use the detect_encoding! method on the String class
require 'charlock_holmes/string'
contents = File.read('test.xml')
# this will detect and set the encoding of `contents`, then return self
contents.detect_encoding!
Transcoding
Being able to detect the encoding of some arbitrary content is nice, but what you probably want is to be able to transcode that content into an encoding your application is using.
content = File.read('test2.txt')
detection = CharlockHolmes::EncodingDetector.detect(content)
utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'
The first parameter is the content to transcode, the second is the source encoding (the encoding the content is assumed to be in), and the third parameter is the destination encoding.
Installing
If the traditional gem install charlock_holmes doesn't work, you may need to specify the path to
your installation of ICU using the --with-icu-dir option during the gem install or by configuring Bundler to
pass those arguments to Gem:
Configure Bundler to always use the correct arguments when installing:
bundle config build.charlock_holmes --with-icu-dir=/path/to/installed/icu4c
Using Gem to install directly without Bundler:
gem install charlock_holmes -- --with-icu-dir=/path/to/installed/icu4c
If you get a compile time error that looks like error: delegating constructors are permitted only in C++11 or something else related to C++11, you need to set the --with-cxxflags=-std=c++11 options
Bundler:
bundle config build.charlock_holmes --with-icu-dir=/path/to/installed/icu4c --with-cxxflags=-std=c++11
Installing directly:
gem install charlock_holmes -- --with-icu-dir=/path/to/installed/icu4c --with-cxxflags=-std=c++11
Homebrew
If you're installing on Mac OS X then using Homebrew is
the easiest way to install ICU.
However, be warned; it is a Keg-Only (see homedir issue #167
for more info) install meaning RubyGems won't find it when installing without specifying --with-icu-dir
To install ICU with Homebrew:
brew install icu4c
Configure Bundler to always use the correct arguments when installing:
bundle config build.charlock_holmes --with-icu-dir=/usr/local/opt/icu4c
Using Gem to install directly without Bundler:
gem install charlock_holmes -- --with-icu-dir=/usr/local/opt/icu4c
Owner metadata
- Name: Brian Lopez
- Login: brianmario
- Email:
- Kind: user
- Description:
- Website: https://twitter.com/brianmario
- Location: Paso Robles, CA
- Twitter: brianmario
- Company: Lopai Cellars
- Icon url: https://avatars.githubusercontent.com/u/11571?u=a1ff7b554784caa8e18118531eabd8650ddf050a&v=4
- Repositories: 38
- Last ynced at: 2025-10-11T20:47:34.713Z
- Profile URL: https://github.com/brianmario
GitHub Events
Total
- Issues event: 3
- Watch event: 28
- Issue comment event: 28
- Pull request review comment event: 2
- Pull request review event: 4
- Pull request event: 4
- Fork event: 11
Last Year
- Issues event: 2
- Watch event: 22
- Issue comment event: 19
- Pull request review comment event: 2
- Pull request review event: 4
- Pull request event: 3
- Fork event: 9
Committers metadata
Last synced: 7 days ago
Total Commits: 212
Total Committers: 25
Avg Commits per committer: 8.48
Development Distribution Score (DDS): 0.259
Commits in past year: 0
Committers in past year: 0
Avg Commits per committer in past year: 0.0
Development Distribution Score (DDS) in past year: 0.0
| Name | Commits | |
|---|---|---|
| Brian Lopez | s****z@g****m | 157 |
| Aaron Patterson | t****e@r****g | 9 |
| Aman Gupta | a****n@t****t | 6 |
| Stan Hu | s****u@g****m | 6 |
| David Graham | d****m@g****m | 4 |
| Misty De Meo | m****o@g****m | 4 |
| Max Veytsman | m****n@g****m | 3 |
| Ken Dreyer | k****r@k****m | 3 |
| Joshua Peek | j****h@j****m | 2 |
| Stephan van Eijkelenburg | s****e@g****m | 2 |
| grosser | m****l@g****t | 2 |
| Alexey Lapitsky | a****y@s****m | 1 |
| Andrew Daugherity | a****y@g****m | 1 |
| Benoit Bénézech | b****h@g****m | 1 |
| Christian Höltje | d****t@g****g | 1 |
| Grey Baker | g****l@g****m | 1 |
| Igor Victor | g****a@y****u | 1 |
| Mike Połtyn | m****e@p****m | 1 |
| Nicolas Leger | n****r | 1 |
| Olle Jonsson | o****n@g****m | 1 |
| Raphael Nestler | r****r@r****h | 1 |
| Vicent Marti | t****u@g****m | 1 |
| Scott J. Goldman | s****g@g****m | 1 |
| Nicolas Leger | n****r@n****m | 1 |
| mhasbini | m****i@g****m | 1 |
Committer domains:
- github.com: 3
- nleger.com: 1
- renuo.ch: 1
- poltyn.com: 1
- yandex.ru: 1
- gerf.org: 1
- spotify.com: 1
- grosser.it: 1
- joshpeek.com: 1
- ktdreyer.com: 1
- tmm1.net: 1
- ruby-lang.org: 1
Issue and Pull Request metadata
Last synced: 7 days ago
Total issues: 74
Total pull requests: 67
Average time to close issues: 10 months
Average time to close pull requests: about 1 year
Total issue authors: 67
Total pull request authors: 32
Average comments per issue: 3.85
Average comments per pull request: 1.54
Merged pull request: 29
Bot issues: 0
Bot pull requests: 0
Past year issues: 4
Past year pull requests: 6
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 4
Past year pull request authors: 4
Past year average comments per issue: 0.0
Past year average comments per pull request: 1.17
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- IsraelBuitronD (2)
- brauliobo (2)
- auguszou (2)
- jeremiahsherrill (2)
- edinoteK (2)
- OneDivZero (2)
- mcandre (2)
- ikappas (1)
- art-solopov (1)
- dustinsgoodman (1)
- LukeShu (1)
- nathany (1)
- Startouf (1)
- badbye (1)
- wpostma (1)
Top Pull Request Authors
- stanhu (9)
- tenderlove (7)
- mistydemeo (4)
- waghanza (4)
- brianmario (3)
- josh (3)
- jhawthorn (3)
- stephenbinns (2)
- T19sk (2)
- rnestler (2)
- gogainda (2)
- mhasbini (2)
- dgraham (2)
- nijikon (2)
- nicolasleger (2)
Top Issue Labels
Top Pull Request Labels
Package metadata
- Total packages: 23
-
Total downloads:
- rubygems: 125,348,649 total
- Total docker downloads: 2,862,029,824
- Total dependent packages: 60 (may contain duplicates)
- Total dependent repositories: 3,719 (may contain duplicates)
- Total versions: 83
- Total maintainers: 6
gem.coop: charlock_holmes
charlock_holmes provides binary and text detection as well as text transcoding using libicu
- Homepage: https://github.com/brianmario/charlock_holmes
- Documentation: http://www.rubydoc.info/gems/charlock_holmes/
- Licenses: MIT
- Latest release: 0.7.9 (published over 1 year ago)
- Last Synced: 2025-12-10T05:31:33.602Z (3 days ago)
- Versions: 28
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 62,623,530 Total
- Docker Downloads: 1,431,014,912
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 0.146%
- Downloads: 0.437%
- Maintainers (2)
rubygems.org: charlock_holmes
charlock_holmes provides binary and text detection as well as text transcoding using libicu
- Homepage: https://github.com/brianmario/charlock_holmes
- Documentation: http://www.rubydoc.info/gems/charlock_holmes/
- Licenses: MIT
- Latest release: 0.7.9 (published over 1 year ago)
- Last Synced: 2025-12-10T20:00:33.092Z (2 days ago)
- Versions: 28
- Dependent Packages: 54
- Dependent Repositories: 3,708
- Downloads: 62,637,255 Total
- Docker Downloads: 1,431,014,912
-
Rankings:
- Docker downloads count: 0.241%
- Downloads: 0.43%
- Dependent packages count: 0.495%
- Dependent repos count: 0.516%
- Average: 0.959%
- Stargazers count: 1.893%
- Forks count: 2.179%
- Maintainers (2)
alpine-v3.18: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r13 (published over 2 years ago)
- Last Synced: 2025-12-10T20:00:58.093Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 4.374%
- Stargazers count: 7.98%
- Forks count: 9.516%
- Maintainers (1)
gem.coop: charlock_holmes_bundle_icu
Character encoding detection, brought to you by ICU
- Homepage: http://github.com/brianmario/charlock_holmes
- Documentation: http://www.rubydoc.info/gems/charlock_holmes_bundle_icu/
- Licenses: mit
- Latest release: 0.6.9.2 (published almost 13 years ago)
- Last Synced: 2025-12-10T20:00:36.415Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 39,365 Total
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 5.801%
- Downloads: 17.404%
- Maintainers (1)
rubygems.org: charlock_holmes_bundle_icu
Character encoding detection, brought to you by ICU
- Homepage: http://github.com/brianmario/charlock_holmes
- Documentation: http://www.rubydoc.info/gems/charlock_holmes_bundle_icu/
- Licenses: mit
- Latest release: 0.6.9.2 (published almost 13 years ago)
- Last Synced: 2025-12-10T20:00:34.380Z (2 days ago)
- Versions: 1
- Dependent Packages: 6
- Dependent Repositories: 11
- Downloads: 39,365 Total
-
Rankings:
- Stargazers count: 1.874%
- Forks count: 2.219%
- Dependent packages count: 2.499%
- Average: 5.867%
- Dependent repos count: 6.908%
- Downloads: 15.838%
- Maintainers (1)
alpine-v3.13: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r5 (published over 5 years ago)
- Last Synced: 2025-12-10T20:00:54.174Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 5.102%
- Forks count: 6.613%
- Average: 7.809%
- Dependent packages count: 19.522%
- Maintainers (1)
alpine-v3.12: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r5 (published over 5 years ago)
- Last Synced: 2025-12-10T20:00:51.869Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 4.49%
- Forks count: 5.641%
- Average: 7.9%
- Dependent packages count: 21.468%
- Maintainers (1)
alpine-v3.9: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.6-r1 (published almost 7 years ago)
- Last Synced: 2025-12-11T07:14:20.878Z (1 day ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 3.816%
- Forks count: 4.934%
- Average: 7.975%
- Dependent packages count: 23.151%
- Maintainers (1)
alpine-v3.11: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r0 (published about 6 years ago)
- Last Synced: 2025-12-10T20:00:48.013Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 4.506%
- Forks count: 5.632%
- Average: 8.185%
- Dependent packages count: 22.601%
- Maintainers (1)
alpine-v3.8: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.6-r0 (published over 7 years ago)
- Last Synced: 2025-12-10T20:00:31.590Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 3.544%
- Forks count: 4.655%
- Average: 8.356%
- Dependent packages count: 25.225%
- Maintainers (1)
alpine-v3.14: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r5 (published over 5 years ago)
- Last Synced: 2025-12-10T20:00:55.835Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 5.201%
- Forks count: 6.659%
- Average: 8.385%
- Dependent packages count: 21.681%
- Maintainers (1)
alpine-edge: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.9-r2 (published 8 months ago)
- Last Synced: 2025-12-10T20:01:15.715Z (2 days ago)
- Versions: 7
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Average: 8.554%
- Stargazers count: 9.15%
- Forks count: 10.426%
- Dependent packages count: 14.641%
- Maintainers (1)
alpine-v3.15: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r7 (published about 4 years ago)
- Last Synced: 2025-11-12T09:13:36.041Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 5.556%
- Forks count: 6.962%
- Average: 9.526%
- Dependent packages count: 25.585%
- Maintainers (1)
alpine-v3.10: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.6-r3 (published over 6 years ago)
- Last Synced: 2025-12-10T20:00:48.876Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 4.004%
- Forks count: 5.222%
- Average: 9.695%
- Dependent packages count: 29.555%
- Maintainers (1)
alpine-v3.16: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r10 (published over 3 years ago)
- Last Synced: 2025-12-10T20:00:52.318Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 5.993%
- Forks count: 7.267%
- Average: 10.143%
- Dependent packages count: 27.311%
- Maintainers (1)
alpine-v3.17: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r11 (published about 3 years ago)
- Last Synced: 2025-12-10T20:00:52.658Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 7.533%
- Forks count: 8.97%
- Average: 10.939%
- Dependent packages count: 27.254%
- Maintainers (1)
spack.io: ruby-charlock-holmes
Character encoding detection, brought to you by ICU.
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: []
- Latest release: 0.7.9 (published about 2 months ago)
- Last Synced: 2025-12-10T20:01:10.160Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 7.951%
- Forks count: 8.99%
- Average: 17.588%
- Dependent packages count: 53.411%
- Maintainers (1)
gem.coop: charlock_holmes_heroku
Character encoding detection, brought to you by ICU
- Homepage: http://github.com/brianmario/charlock_holmes
- Documentation: http://www.rubydoc.info/gems/charlock_holmes_heroku/
- Licenses: mit
- Latest release: 0.6.13 (published about 12 years ago)
- Last Synced: 2025-12-10T20:00:36.748Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 4,567 Total
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 23.99%
- Downloads: 71.97%
- Maintainers (1)
rubygems.org: charlock_holmes_heroku
Character encoding detection, brought to you by ICU
- Homepage: http://github.com/brianmario/charlock_holmes
- Documentation: http://www.rubydoc.info/gems/charlock_holmes_heroku/
- Licenses: mit
- Latest release: 0.6.13 (published about 12 years ago)
- Last Synced: 2025-12-10T20:00:37.405Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 4,567 Total
-
Rankings:
- Stargazers count: 1.729%
- Forks count: 2.028%
- Dependent packages count: 15.706%
- Average: 27.691%
- Dependent repos count: 46.782%
- Downloads: 72.211%
- Maintainers (1)
alpine-v3.21: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.9-r0 (published about 1 year ago)
- Last Synced: 2025-12-10T20:00:57.821Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
alpine-v3.22: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.9-r2 (published 8 months ago)
- Last Synced: 2025-12-10T20:01:05.778Z (2 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
alpine-v3.19: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r14 (published about 2 years ago)
- Last Synced: 2025-11-16T12:21:34.924Z (26 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
alpine-v3.20: ruby-charlock_holmes
Character encoding detection, brought to you by ICU
- Homepage: https://github.com/brianmario/charlock_holmes
- Licenses: MIT
- Latest release: 0.7.7-r15 (published almost 2 years ago)
- Last Synced: 2025-11-15T11:52:20.152Z (27 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
Dependencies
- chardet ~> 0.9 development
- minitest ~> 5.11 development
- rake-compiler ~> 1.0 development
Score: 32.06916364401828