Trying ABR-Geocoder
This article was written with the assistance of AI.
Purpose
- Verify what ABR-geocoder can do
- Experience ABR-geocoder's accuracy firsthand
- Understand setup and operational considerations
ABR Geocoder Overview
Official site: https://lp.geocoder.address-br.digital.go.jp/
ABR Geocoder is a geocoding tool provided by Japan's Digital Agency that normalizes Japanese address strings and returns latitude/longitude coordinates. It uses the Address Base Registry (a nationwide address master database) as its data source, which is updated monthly by the government, ensuring high reliability. It's provided under the MIT License and can operate offline (without external API calls).
Key Features
- Address Normalization: "千代田区霞が関1−3−1" → "東京都千代田区霞が関一丁目3番1号"
- Geocoding: Get latitude/longitude from addresses
- Town ID Assignment: Returns unique identifiers for addresses
- Notation Variation Handling: Unifies kanji/Arabic numerals, old/new character forms, etc.
Technical Specifications
- Language: Node.js / TypeScript
- DB: SQLite (local DB, approximately 50GB for nationwide data)
Characteristics
- Free: Open source (MIT License)
- Offline Operation: No external API required
- High Accuracy: Uses official government data
- Regular Updates: Digital Agency updates data monthly
Local Setup
# Requires Node.js 20 or higher
npm install -g @digital-go-jp/abr-geocoder
# Download data (nationwide)
abrg download
# Download specific region only (e.g., Tokyo)
abrg download -c 130001
# Start as REST server
abrg serve start
# Start with specific port (default is 3000)
abrg serve start -p 8080Download Execution Time
Running locally, depending on network and environment, it took about 1 hour:
❯ abrg download --debug
download: 1:00:53.900 (h:mm:ss.mmm)Testing
Example curl request:
curl 'http://localhost:3000/geocode?address=東京都千代田区霞が関1-3-1' | jqResponse:
{
"query": {
"input": "東京都千代田区霞が関1-3-1"
},
"result": {
"output": "東京都千代田区霞が関一丁目3-1",
"score": 0.82,
"match_level": "residential_block",
"lat": 35.671555,
"lon": 139.751467,
"pref": "東京都",
"city": "千代田区",
"oaza_cho": "霞が関",
"chome": "一丁目",
"blk_num": "3"
}
}Test Results Summary
Testing with various address patterns revealed the following characteristics:
| Category | Input Example | Processing Result |
|---|---|---|
| Mixed kanji/Arabic numerals | 六本木6-10-1 | Normalized to 6丁目10-1 |
| Full-width/half-width chars | 1-7-1 | Normalized to 1-7-1 |
| Hokkaido addresses | 北3条西6丁目 | Correctly recognizes jō-chōme format |
| Kyoto street names | 寺町通御池上る | Stored in koaza field |
| Old character forms | 澁谷区澁谷 | Normalized to 渋谷区渋谷 |
| With building name | 丸の内1-9-1 東京駅 | Stored in others field |
| Prefecture omitted | 千代田区 | Auto-completes to 東京都 |
Notes
Complete registration may not exist down to residential numbers
For the output of 千代田区1-9-1 東京駅, the others field shows -1 東京駅. For large buildings especially, only the block number (1-9) may be registered.
→ Building names and room numbers may not always be extracted perfectly
Score Patterns
| Input Pattern | Score | Characteristic |
|---|---|---|
| Prefecture included | 0.82-0.88 | High score |
| Prefecture omitted | 0.57-0.74 | Lower score |
| Old character forms | 0.5 | Lowest |
| With building name | 0.7 | Medium |
Practical judgment criteria (guidelines):
- 0.8 or higher: Trust as-is
- 0.6-0.8: Recommend verification
- Below 0.6: Prompt for input review
Using ABR Geocoder with ECR
By default, related files are stored in ~/.abr-geocoder. This folder is approximately 58GB after downloading nationwide data.
❯ du -sk .abr-geocoder | awk '{print $1/1024 " MiB"}'
57419.9 MiBSQLite and Mounted Filesystem Performance Impact
Building a Docker image containing nationwide data is impractical (ECR has a 10GB Image Layer size limit, making it essentially impossible). Since ABR Geocoder uses SQLite internally, storage type selection directly impacts performance.
Storage Selection Impact
| Storage Type | Response Time (estimate) | Recommended Use |
|---|---|---|
| EBS gp3 | 20-100ms | Production |
| EFS | 50-500ms | Multi-container sharing |
| FSx for Lustre | 15-80ms | High-performance requirements |
SQLite Optimization Settings
# SQLite performance tuning
export SQLITE_TMPDIR=/dev/shm # Temp files in memory
export PRAGMA_CACHE_SIZE=10000 # Increase cache size
export PRAGMA_MMAP_SIZE=268435456 # Memory map sizeDeployment Strategy (Data Updates)
EBS volumes can only be attached to a single Fargate Task (which actually makes sense given SQLite connections).
The following configuration can achieve cost optimization / deployment time reduction / availability:
- Update EBS snapshots via daily batch processing
- Blue/Green deploy Tasks with volumes mounted from snapshots
abrg update-check Execution
The criteria for available update data retrieval is unclear, but update targets appear every time if there are files to update. Processing time was 3.5 minutes.
❯ abrg update-check
利用可能な更新データ(3793)があります。
続けてデータをダウンロードしますか? [Y/N]API Load Testing
Serial execution test on local instance. Responses range from 0.015 - 0.060s. This is just for reference as it's local execution only.
GitHub mentions parallel processing, so while the runtime appears to be Node.js, it may use clustering. In that case, increasing CPU count may help. Memory usage should also be monitored and configured accordingly. Proper load testing is recommended to determine these values.
Summary
The following configuration seems optimal:
- Use Fargate + EBS
- Create batch processing to update EBS snapshots with abr-geocoder data
- Blue/Green deployment using EBS volumes created from the latest snapshots
- Data updates and abr-geocoder updates are applied during deployment
