Roman Numeral Parser

I wondered the other night how straightforward a task parsing roman numerals (up to 4999) is. As revealed by about 30 seconds of Googling: pretty straightforward. Here it is in Python:

import re

ROMAN_RE = re.compile("""
    ^                   # beginning of string
    (M{0,4})            # thousands - 0 to 4 M's
    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                        #            or 500-800 (D, followed by 0 to 3 C's)
    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                        #        or 50-80 (L, followed by 0 to 3 X's)
    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                        #        or 5-8 (V, followed by 0 to 3 I's)
    $                   # end of string
""", re.VERBOSE)

ROMAN_DIGITS  = {
    'M'    : 1000, 'MM'   : 2000, 'MMM'  : 3000, 'MMMM' : 4000,
    'CM'   : 900,  'CD'   : 400,  'D'    : 500,  'DC'   : 600,
    'DCC'  : 700,  'DCCC' : 800,  'C'    : 100,  'CC'   : 200,
    'CCC'  : 300,  'XC'   : 90,   'XL'   : 40,   'L'    : 50,
    'LX'   : 60,   'LXX'  : 70,   'LXXX' : 80,   'X'    : 10,
    'XX'   : 20,   'XXX'  : 30,   'IX'   : 9,    'IV'   : 4,
    'V'    : 5,    'VI'   : 6,    'VII'  : 7,    'VIII' : 8,
    'I'    : 1,    'II'   : 2,    'III'  : 3
}

def rtoi(roman):
    match = ROMAN_RE.match(roman.upper())
    if match and 0 < sum(1 for x in match.groups() if 0 < len(x)):
        return sum(ROMAN_DIGITS[x] for x in match.groups() if 0 < len(x))
    return None

Posted in | | 2 Responses

2 responses to “Roman Numeral Parser”

  1. Christopher Smith
    September 20, 2007 at 10:35 am |

    Hey, how about handling numbers >=4000 eh?

Leave a Reply