logoChibiham
cover
📍

Trying ABR-Geocoder

This article was written with the assistance of AI.

Purpose

  • Verify what ABR-geocoder can do
  • Experience ABR-geocoder's accuracy firsthand
  • Understand setup and operational considerations

ABR Geocoder Overview

Official site: https://lp.geocoder.address-br.digital.go.jp/

ABR Geocoder is a geocoding tool provided by Japan's Digital Agency that normalizes Japanese address strings and returns latitude/longitude coordinates. It uses the Address Base Registry (a nationwide address master database) as its data source, which is updated monthly by the government, ensuring high reliability. It's provided under the MIT License and can operate offline (without external API calls).

Key Features

  • Address Normalization: "千代田区霞が関1−3−1" → "東京都千代田区霞が関一丁目3番1号"
  • Geocoding: Get latitude/longitude from addresses
  • Town ID Assignment: Returns unique identifiers for addresses
  • Notation Variation Handling: Unifies kanji/Arabic numerals, old/new character forms, etc.

Technical Specifications

  • Language: Node.js / TypeScript
  • DB: SQLite (local DB, approximately 50GB for nationwide data)

Characteristics

  • Free: Open source (MIT License)
  • Offline Operation: No external API required
  • High Accuracy: Uses official government data
  • Regular Updates: Digital Agency updates data monthly

Local Setup

bash
# Requires Node.js 20 or higher
npm install -g @digital-go-jp/abr-geocoder

# Download data (nationwide)
abrg download

# Download specific region only (e.g., Tokyo)
abrg download -c 130001

# Start as REST server
abrg serve start

# Start with specific port (default is 3000)
abrg serve start -p 8080

Download Execution Time

Running locally, depending on network and environment, it took about 1 hour:

bash
❯ abrg download --debug
download: 1:00:53.900 (h:mm:ss.mmm)

Testing

Example curl request:

bash
curl 'http://localhost:3000/geocode?address=東京都千代田区霞が関1-3-1' | jq

Response:

json
{
  "query": {
    "input": "東京都千代田区霞が関1-3-1"
  },
  "result": {
    "output": "東京都千代田区霞が関一丁目3-1",
    "score": 0.82,
    "match_level": "residential_block",
    "lat": 35.671555,
    "lon": 139.751467,
    "pref": "東京都",
    "city": "千代田区",
    "oaza_cho": "霞が関",
    "chome": "一丁目",
    "blk_num": "3"
  }
}

Test Results Summary

Testing with various address patterns revealed the following characteristics:

CategoryInput ExampleProcessing Result
Mixed kanji/Arabic numerals六本木6-10-1Normalized to 6丁目10-1
Full-width/half-width chars1-7-1Normalized to 1-7-1
Hokkaido addresses北3条西6丁目Correctly recognizes jō-chōme format
Kyoto street names寺町通御池上るStored in koaza field
Old character forms澁谷区澁谷Normalized to 渋谷区渋谷
With building name丸の内1-9-1 東京駅Stored in others field
Prefecture omitted千代田区Auto-completes to 東京都

Notes

Complete registration may not exist down to residential numbers

For the output of 千代田区1-9-1 東京駅, the others field shows -1 東京駅. For large buildings especially, only the block number (1-9) may be registered.

→ Building names and room numbers may not always be extracted perfectly

Score Patterns

Input PatternScoreCharacteristic
Prefecture included0.82-0.88High score
Prefecture omitted0.57-0.74Lower score
Old character forms0.5Lowest
With building name0.7Medium

Practical judgment criteria (guidelines):

  • 0.8 or higher: Trust as-is
  • 0.6-0.8: Recommend verification
  • Below 0.6: Prompt for input review

Using ABR Geocoder with ECR

By default, related files are stored in ~/.abr-geocoder. This folder is approximately 58GB after downloading nationwide data.

bash
❯ du -sk .abr-geocoder | awk '{print $1/1024 " MiB"}'
57419.9 MiB

SQLite and Mounted Filesystem Performance Impact

Building a Docker image containing nationwide data is impractical (ECR has a 10GB Image Layer size limit, making it essentially impossible). Since ABR Geocoder uses SQLite internally, storage type selection directly impacts performance.

Storage Selection Impact

Storage TypeResponse Time (estimate)Recommended Use
EBS gp320-100msProduction
EFS50-500msMulti-container sharing
FSx for Lustre15-80msHigh-performance requirements

SQLite Optimization Settings

bash
# SQLite performance tuning
export SQLITE_TMPDIR=/dev/shm  # Temp files in memory
export PRAGMA_CACHE_SIZE=10000  # Increase cache size
export PRAGMA_MMAP_SIZE=268435456  # Memory map size

Deployment Strategy (Data Updates)

EBS volumes can only be attached to a single Fargate Task (which actually makes sense given SQLite connections).

The following configuration can achieve cost optimization / deployment time reduction / availability:

  • Update EBS snapshots via daily batch processing
  • Blue/Green deploy Tasks with volumes mounted from snapshots

abrg update-check Execution

The criteria for available update data retrieval is unclear, but update targets appear every time if there are files to update. Processing time was 3.5 minutes.

bash
❯ abrg update-check
利用可能な更新データ(3793)があります。
続けてデータをダウンロードしますか? [Y/N]

API Load Testing

Serial execution test on local instance. Responses range from 0.015 - 0.060s. This is just for reference as it's local execution only.

GitHub mentions parallel processing, so while the runtime appears to be Node.js, it may use clustering. In that case, increasing CPU count may help. Memory usage should also be monitored and configured accordingly. Proper load testing is recommended to determine these values.

Summary

The following configuration seems optimal:

  • Use Fargate + EBS
  • Create batch processing to update EBS snapshots with abr-geocoder data
  • Blue/Green deployment using EBS volumes created from the latest snapshots
    • Data updates and abr-geocoder updates are applied during deployment