I wondered the other night how straightforward a task parsing roman numerals (up to 4999) is. As revealed by about 30 seconds of Googling: pretty straightforward. Here it is in Python:

```
import re
ROMAN_RE = re.compile("""
^ # beginning of string
(M{0,4}) # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
""", re.VERBOSE)
ROMAN_DIGITS = {
'M' : 1000, 'MM' : 2000, 'MMM' : 3000, 'MMMM' : 4000,
'CM' : 900, 'CD' : 400, 'D' : 500, 'DC' : 600,
'DCC' : 700, 'DCCC' : 800, 'C' : 100, 'CC' : 200,
'CCC' : 300, 'XC' : 90, 'XL' : 40, 'L' : 50,
'LX' : 60, 'LXX' : 70, 'LXXX' : 80, 'X' : 10,
'XX' : 20, 'XXX' : 30, 'IX' : 9, 'IV' : 4,
'V' : 5, 'VI' : 6, 'VII' : 7, 'VIII' : 8,
'I' : 1, 'II' : 2, 'III' : 3
}
def rtoi(roman):
match = ROMAN_RE.match(roman.upper())
if match and 0 < sum(1 for x in match.groups() if 0 < len(x)):
return sum(ROMAN_DIGITS[x] for x in match.groups() if 0 < len(x))
return None
```

## 2 responses to “Roman Numeral Parser”

Hey, how about handling numbers >=4000 eh?

I can’t parse them. How about that? :-P I suppose I could have done something for the >5k symbols, (with the bars over them and what have you) but for my purposes this was enough.