https://github.com/tonytonyjan/jaro_winkler
Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
https://github.com/tonytonyjan/jaro_winkler
Keywords
algorithm jaro-winkler jaro-winkler-distance ruby
Keywords from Contributors
activerecord rubygems activejob mvc rubocop sinatra code-formatter static-code-analysis rspec crash-reporting
Last synced: about 4 hours ago
JSON representation
Repository metadata
Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
- Host: GitHub
- URL: https://github.com/tonytonyjan/jaro_winkler
- Owner: tonytonyjan
- License: mit
- Created: 2014-09-06T17:40:22.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2025-05-11T13:58:16.000Z (8 months ago)
- Last Synced: 2025-12-07T19:56:52.837Z (about 1 month ago)
- Topics: algorithm, jaro-winkler, jaro-winkler-distance, ruby
- Language: Ruby
- Homepage:
- Size: 202 KB
- Stars: 202
- Watchers: 7
- Forks: 33
- Open Issues: 10
- Releases: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
README.md
jaro_winkler is an implementation of Jaro-Winkler similarity algorithm which is written in C extension and will fallback to pure Ruby version in platforms other than MRI/KRI like JRuby or Rubinius. Both of C and Ruby implementation support any kind of string encoding, such as UTF-8, EUC-JP, Big5, etc.
Installation
gem install jaro_winkler
Usage
require 'jaro_winkler'
# Jaro Winkler Similarity
JaroWinkler.similarity "MARTHA", "MARHTA"
# => 0.9611
JaroWinkler.similarity "MARTHA", "marhta", ignore_case: true
# => 0.9611
JaroWinkler.similarity "MARTHA", "MARHTA", weight: 0.2
# => 0.9778
# Jaro Similarity
JaroWinkler.jaro_similarity "MARTHA", "MARHTA"
# => 0.9444444444444445
There is no JaroWinkler.jaro_winkler_similarity, it's tediously long.
Options
| Name | Type | Default | Note |
|---|---|---|---|
| ignore_case | boolean | false | All lower case characters are converted to upper case prior to the comparison. |
| weight | number | 0.1 | A constant scaling factor for how much the score is adjusted upwards for having common prefixes. |
| threshold | number | 0.7 | The prefix bonus is only added when the compared strings have a Jaro similarity above the threshold. |
| adj_table | boolean | false | The option is used to give partial credit for characters that may be errors due to known phonetic or character recognition errors. A typical example is to match the letter "O" with the number "0". |
Adjusting Table
Default Table
['A', 'E'], ['A', 'I'], ['A', 'O'], ['A', 'U'], ['B', 'V'], ['E', 'I'], ['E', 'O'], ['E', 'U'], ['I', 'O'], ['I', 'U'],
['O', 'U'], ['I', 'Y'], ['E', 'Y'], ['C', 'G'], ['E', 'F'], ['W', 'U'], ['W', 'V'], ['X', 'K'], ['S', 'Z'], ['X', 'S'],
['Q', 'C'], ['U', 'V'], ['M', 'N'], ['L', 'I'], ['Q', 'O'], ['P', 'R'], ['I', 'J'], ['2', 'Z'], ['5', 'S'], ['8', 'B'],
['1', 'I'], ['1', 'L'], ['0', 'O'], ['0', 'Q'], ['C', 'K'], ['G', 'J'], ['E', ' '], ['Y', ' '], ['S', ' ']
How it works?
Original Formula:
where
mis the number of matching characters.tis half the number of transpositions.
With Adjusting Table:
where
sis the number of nonmatching but similar characters.
Why This?
There is also another similar gem named fuzzy-string-match which both provides C and Ruby version as well.
I reinvent this wheel because of the naming in fuzzy-string-match such as getDistance breaks convention, and some weird code like a1 = s1.split( // ) (s1.chars could be better), furthermore, it's bugged (see tables below).
Compare with other gems
| jaro_winkler | fuzzystringmatch | hotwater | amatch | |
|---|---|---|---|---|
| Encoding Support | Yes | Pure Ruby only | No | No |
| Windows Support | Yes | ? | No | Yes |
| Adjusting Table | Yes | No | No | No |
| Native | Yes | Yes | Yes | Yes |
| Pure Ruby | Yes | Yes | No | No |
| Speed | 1st | 3rd | 2nd | 4th |
I made a table below to compare accuracy between each gem:
| str_1 | str_2 | origin | jaro_winkler | fuzzystringmatch | hotwater | amatch |
|---|---|---|---|---|---|---|
| "henka" | "henkan" | 0.9667 | 0.9667 | 0.9722 | 0.9667 | 0.9444 |
| "al" | "al" | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| "martha" | "marhta" | 0.9611 | 0.9611 | 0.9611 | 0.9611 | 0.9444 |
| "jones" | "johnson" | 0.8324 | 0.8324 | 0.8324 | 0.8324 | 0.7905 |
| "abcvwxyz" | "cabvwxyz" | 0.9583 | 0.9583 | 0.9583 | 0.9583 | 0.9583 |
| "dwayne" | "duane" | 0.84 | 0.84 | 0.84 | 0.84 | 0.8222 |
| "dixon" | "dicksonx" | 0.8133 | 0.8133 | 0.8133 | 0.8133 | 0.7667 |
| "fvie" | "ten" | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
- The "origin" result is from the original C implementation by the author of the algorithm.
- Test data are borrowed from fuzzy-string-match's rspec file.
Benchmark
$ bundle exec rake benchmark
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]
# C Extension
Rehearsal --------------------------------------------------------------
jaro_winkler (8c16e09) 0.240000 0.000000 0.240000 ( 0.241347)
fuzzy-string-match (1.0.1) 0.400000 0.010000 0.410000 ( 0.403673)
hotwater (0.1.2) 0.250000 0.000000 0.250000 ( 0.254503)
amatch (0.4.0) 0.870000 0.000000 0.870000 ( 0.875930)
----------------------------------------------------- total: 1.770000sec
user system total real
jaro_winkler (8c16e09) 0.230000 0.000000 0.230000 ( 0.236921)
fuzzy-string-match (1.0.1) 0.380000 0.000000 0.380000 ( 0.381942)
hotwater (0.1.2) 0.250000 0.000000 0.250000 ( 0.254977)
amatch (0.4.0) 0.860000 0.000000 0.860000 ( 0.861207)
# Pure Ruby
Rehearsal --------------------------------------------------------------
jaro_winkler (8c16e09) 0.440000 0.000000 0.440000 ( 0.438470)
fuzzy-string-match (1.0.1) 0.860000 0.000000 0.860000 ( 0.862850)
----------------------------------------------------- total: 1.300000sec
user system total real
jaro_winkler (8c16e09) 0.440000 0.000000 0.440000 ( 0.439237)
fuzzy-string-match (1.0.1) 0.910000 0.010000 0.920000 ( 0.920259)
Todo
- Custom adjusting word table.
Owner metadata
- Name: 簡煒航 (Weihang Jian)
- Login: tonytonyjan
- Email:
- Kind: user
- Description: Rubyist, Rustacean, Web Developer, Software Architect, Conference Speaker, Book Writer, Amateur Piano Player/Composer, Video Gamer, Whiskey Lover.
- Website: https://tonytonyjan.net
- Location: Taiwan
- Twitter: tonytonyjan
- Company:
- Icon url: https://avatars.githubusercontent.com/u/809410?u=d89171dbae587727d4f61de8d128c02f16269174&v=4
- Repositories: 187
- Last ynced at: 2024-04-09T19:11:58.495Z
- Profile URL: https://github.com/tonytonyjan
GitHub Events
Total
- Issues event: 4
- Watch event: 8
- Delete event: 1
- Issue comment event: 5
- Push event: 5
- Pull request event: 2
- Fork event: 5
- Create event: 1
Last Year
- Issues event: 2
- Watch event: 5
- Delete event: 1
- Issue comment event: 3
- Push event: 5
- Pull request event: 2
- Fork event: 5
- Create event: 1
Committers metadata
Last synced: 1 day ago
Total Commits: 243
Total Committers: 12
Avg Commits per committer: 20.25
Development Distribution Score (DDS): 0.07
Commits in past year: 3
Committers in past year: 1
Avg Commits per committer in past year: 3.0
Development Distribution Score (DDS) in past year: 0.0
| Name | Commits | |
|---|---|---|
| Tony Jian | t****n@g****m | 226 |
| Masafumi Koba | 4****s | 5 |
| Seiei Miyagi | h****n@g****m | 3 |
| pocari | c****r@g****m | 1 |
| Tom Epperly | t****y@g****m | 1 |
| Orien Madgwick | _@o****o | 1 |
| Manu Wallner | w****n@a****m | 1 |
| MSP-Greg | M****g | 1 |
| Frederick Zhang | f****8@t****e | 1 |
| Benjamin Quorning | b****n@q****t | 1 |
| Salvatore Testa | s****l@s****m | 1 |
| Eddie Barraco | c****t@e****r | 1 |
Committer domains:
- eddiebarraco.fr: 1
- squareup.com: 1
- quorning.net: 1
- tsundere.moe: 1
- amazon.com: 1
- orien.io: 1
Issue and Pull Request metadata
Last synced: 9 days ago
Total issues: 41
Total pull requests: 21
Average time to close issues: 4 months
Average time to close pull requests: 6 months
Total issue authors: 34
Total pull request authors: 16
Average comments per issue: 3.05
Average comments per pull request: 1.67
Merged pull request: 16
Bot issues: 0
Bot pull requests: 0
Past year issues: 3
Past year pull requests: 1
Past year average time to close issues: 1 day
Past year average time to close pull requests: 2 minutes
Past year issue authors: 3
Past year pull request authors: 1
Past year average comments per issue: 2.33
Past year average comments per pull request: 0.0
Past year merged pull request: 1
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- tonytonyjan (4)
- tepperly (2)
- yuki24 (2)
- Arcovion (2)
- sandstrom (2)
- MITSUBOSHI (1)
- nekomaho (1)
- sfgeorge (1)
- mockdeep (1)
- stacyharper (1)
- pintergreg (1)
- rubin55 (1)
- pocari (1)
- Freaky (1)
- IlyaOsotov (1)
Top Pull Request Authors
- tepperly (3)
- tonytonyjan (3)
- ybiquitous (2)
- hanachin (2)
- milch (1)
- jmarrec (1)
- bquorning (1)
- pocari (1)
- casperisfine (1)
- orien (1)
- lwille (1)
- MSP-Greg (1)
- aried3r (1)
- SalvatoreT (1)
- Frederick888 (1)
Top Issue Labels
- Hacktoberfest (1)
- duplicate (1)
Top Pull Request Labels
Package metadata
- Total packages: 3
-
Total downloads:
- rubygems: 257,541,641 total
- Total docker downloads: 1,260,854,258
- Total dependent packages: 11 (may contain duplicates)
- Total dependent repositories: 31,539 (may contain duplicates)
- Total versions: 120
- Total maintainers: 1
gem.coop: jaro_winkler
jaro_winkler is an implementation of Jaro-Winkler \ distance algorithm which is written in C extension and will fallback to pure \ Ruby version in platforms other than MRI/KRI like JRuby or Rubinius. Both of \ C and Ruby implementation support any kind of string encoding, such as \ UTF-8, EUC-JP, Big5, etc.
- Homepage: https://github.com/tonytonyjan/jaro_winkler
- Documentation: http://www.rubydoc.info/gems/jaro_winkler/
- Licenses: MIT
- Latest release: 1.6.1 (published 8 months ago)
- Last Synced: 2026-01-10T13:54:03.188Z (about 13 hours ago)
- Versions: 45
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 128,771,088 Total
- Docker Downloads: 630,427,129
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 0.081%
- Docker downloads count: 0.143%
- Downloads: 0.181%
- Maintainers (1)
rubygems.org: jaro_winkler
jaro_winkler is an implementation of Jaro-Winkler \ distance algorithm which is written in C extension and will fallback to pure \ Ruby version in platforms other than MRI/KRI like JRuby or Rubinius. Both of \ C and Ruby implementation support any kind of string encoding, such as \ UTF-8, EUC-JP, Big5, etc.
- Homepage: https://github.com/tonytonyjan/jaro_winkler
- Documentation: http://www.rubydoc.info/gems/jaro_winkler/
- Licenses: MIT
- Latest release: 1.6.1 (published 8 months ago)
- Last Synced: 2026-01-10T11:46:02.751Z (about 15 hours ago)
- Versions: 45
- Dependent Packages: 11
- Dependent Repositories: 31,539
- Downloads: 128,770,553 Total
- Docker Downloads: 630,427,129
-
Rankings:
- Downloads: 0.162%
- Dependent repos count: 0.194%
- Docker downloads count: 0.254%
- Dependent packages count: 1.637%
- Average: 1.924%
- Stargazers count: 4.125%
- Forks count: 5.172%
- Maintainers (1)
proxy.golang.org: github.com/tonytonyjan/jaro_winkler
- Homepage:
- Documentation: https://pkg.go.dev/github.com/tonytonyjan/jaro_winkler#section-documentation
- Licenses: mit
- Latest release: v1.6.0 (published over 1 year ago)
- Last Synced: 2026-01-09T11:27:02.573Z (1 day ago)
- Versions: 30
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent packages count: 6.497%
- Average: 6.716%
- Dependent repos count: 6.936%
Dependencies
- amatch >= 0 development
- fuzzy-string-match >= 0 development
- hotwater >= 0 development
- bundler ~> 1.7 development
- minitest >= 0 development
- rake ~> 12.0 development
- rake-compiler >= 0 development
- bundler ~> 1.7 development
- minitest >= 0 development
- rake ~> 12.0 development
- rake-compiler >= 0 development
- actions/checkout v3 composite
- ruby/setup-ruby v1 composite
Score: 28.982435085543944