Help me write idiomatically correct Python

Why can’t the features I like from every programming language be available all the time? It really bugs me that, for instance, I can’t get lazy lists and patter matching in Python and I can’t get, well, there’s got to be something that Python has that’s sort of a pain in Haskell. I/O, for example.

Anyway, I want code that does this:

import Prelude hiding (Left, Right)

data EitherBoth a = Left a | Right a | Both a a deriving Show

cmpseq :: (Ord a) => (a -> a -> Ordering) -> [a] -> [a] -> [EitherBoth a]
cmpseq _ [] rs = map Right rs
cmpseq _ ls [] = map Left  ls
cmpseq cmp (l:ls) (r:rs) = case (cmp l r) of
                             LT -> Left   l : cmpseq cmp ls  (r:rs)
                             GT -> Right  r : cmpseq cmp (l:ls) rs
                             EQ -> Both l r : cmpseq cmp ls     rs

So given two ordered lists and a comparison function, I want to know (efficiently, and preserving order) which elements are just in one list and which are in both. The Haskell version is pretty clear and pretty concise. (Bonus points: if you have a better way to do this in Haskell — and I’m sure there is one — I’d love to hear about it.)

So far, the best I can do in Python is this rather ugly mess:

def cmpseq(f, xs, ys):
    nx, ny = len(xs), len(ys)
    ix, iy = 0, 0
    while ix < nx and iy < ny:
        r = f(xs[ix], ys[iy])
        if -1 == r:
            yield (xs[ix], None)
            ix = 1 + ix
        elif 1 == r:
            yield (None, ys[iy])
            iy = 1 + iy
        else:
            yield (xs[ix], ys[iy])
            ix, iy = 1 + ix, 1 + iy
    while ix < nx:
        yield (xs[ix], None)
        ix = 1 + ix
    while iy < ny:
        yield (None, ys[iy])
        iy = 1 + iy

Yuck. I’m going to go out on a limb and say that five yields is too many. What’s the idiomatically correct way to do this? Perhaps something to do with feeding values back in to generators? Python language lawyers: please show me the error of my ways!

Posted in | | 9 Responses

9 responses to “Help me write idiomatically correct Python”

  1. Lionel Barret
    September 30, 2008 at 1:10 am |

    mmm…. you should use the iteration protocol: the index stuff :

    
    nx, ny = len(xs), len(ys)
    ix, iy = 0, 0
    while ix < nx and iy < ny:
    
    

    seems strange to me.

    beside, i dont’ see how it is different than:

    
    while x, y in zip(xs, ys)
    
    

    more, using relevant datastructure would be more natural:

    
    xs = set(xs)
    ys = set(ys)
    in_both = xs.intersection(ys)
    
    

    you should also look into itertools, so you can process xs and ys:

    
    max_len = max(len(xs), len(ys))
    x_repeat_none = max_len - len(xs)
    y_repeat_none = max_len - len(ys)
    new_xs = chaine(xs, repeat(None, x_repeat_none)
    new_ys = chaine(ys, repeat(None, y_repeat_none)
    while x, y in new_xs, new_ys:
        yield x, y
    
    
  2. i0
    i0
    September 30, 2008 at 2:34 am |

    I’m sorry, but I find both Haskell and Python examples confusing. Can you perhaps add a test case?

  3. i0
    i0
    September 30, 2008 at 3:21 am |

    Still fugly, but thats what you get for doing nested code as flat: http://paste.org/index.php?id=3918 ;-)

  4. Corey Porter
    September 30, 2008 at 9:52 am |

    @i0

    Here’s a test case:

    import unittest
    
    class TestCmpseq(unittest.TestCase):
        def testOddsAndEvens(self):
            expected = [ (1, None),
                         (2, 2),
                         (3, None),
                         (4, 4),
                         (5, None),
                         (None, 6),
                         (None, 8),
                         (None, 10) ]
    
            resultant = list(cmpseq(cmp, [1,2,3,4,5], [2,4,6,8,10]))
            self.assertEquals(expected, resultant)
    

    @Lionel

    First off, yes, the indexing stuff is crazy ugly. I don’t like it one bit. Also, doing the same with sets would express the problem much more cleanly, and it’s probably the way I’ll end up going with this, as I wouldn’t dare ask somebody (even myself!) to maintain the above code. However, it takes the problem’s time complexity to log from linear and it requires linear additional space (as opposed to constant).

    I’d love to use something like zip, except I need to not advance one of the iterators under certain circumstances. I could write a custom version that did something with the argument supplied to next(), but that would make constructs like “for x,y in zip(…)” impossible.

    I gave a go of doing the same thing with iterators rather than indexing. The problem I run in do is knowing after StopIteration is thrown which iterator ran out. I ended up having to play all sorts of games with setting values to None before I get the next value, etc. Here’s where I’m at now:

    def cmpseq(f, xs, ys):
        xs, ys = iter(xs), iter(ys)
        try:
            x, y = None, None
            x = xs.next()
            y = ys.next()
            while True:
                c = f(x, y)
                if -1 == c:
                    yield (x, None)
                    x = None
                    x = xs.next()
                elif 1 == c:
                    yield (None, y)
                    y = None
                    y = ys.next()
                else:
                    yield (x, y)
                    x, y = None, None
                    x = xs.next()
                    y = ys.next()
        except StopIteration, s:
            pass
    
        if x is not None: yield (x, None)
        if y is not None: yield (None, y)
    
        for x in xs: yield (x, None)
        for y in ys: yield (None, y)
    

    Which seems a little cleaner in that I’m not manually keeping track of array indexes, but it’s still a far cry from the Haskell version. Perhaps the moral of the story is to not try to cram language constructs and algorithms that make sense in on context in to other systems where they don’t. Oh well? (Mercifully, the log time/linear space solution will be just fine for the data I’m currently working with.)

  5. David Goodger
    October 1, 2008 at 10:55 am |

    You can do it with a single list comprehension. This requires Python 2.5 for the conditional expressions:

    def cmpseq(s1, s2): return [((item if item in s1 else None), (item if item in s2 else None)) for item in sorted(set(s1 + s2))]

    import pprint

    seq1 = [1,2,3,4,5] seq2 = [2,4,6,8,10]

    pprint.pprint(cmpseq(seq1, seq2))

    If seq1 & seq2 are large, the above will slow down (O(N^2)). Then use this instead:

    def cmpseq(s1, s2): set1 = set(s1) set2 = set(s2) return [((item if item in set1 else None), (item if item in set2 else None)) for item in sorted(set(s1 + s2))]

  6. David Goodger
    October 1, 2008 at 10:57 am |

    Your comment system doesn’t preserve newlines or whitespace, and it doesn’t say what to use. (HTML tags? Which ones are OK? Other markup?)

  7. David Goodger
    October 1, 2008 at 11:03 am |

    properly formatted code:

    http://paste.org/index.php?id=3925

  8. leonardo
    leonardo
    October 5, 2008 at 5:04 am |

    In Python it’s faster to process (perform the union, here) the sets first and sort them later:

    def cmpseq(seq1, seq2): s1 = set(seq1) s2 = set(seq2) union = sorted(s1.union(s2)) return [((x if x in s1 else None), (x if x in s2 else None)) for x in union]

Leave a Reply