I’m Embarassed

I’m embarrassed!  Every now and then — okay, quite frequently — I’m reminded that when I started my I.T. career I had no interest in being a developer.  It wasn’t until I got sick of getting certified in this technology or the other, and being married to a pager that I decided I wanted to learn how to be a developer.  I had no official training and am pretty much “self taught” — and at times, I feel that it shows through.

For example, I have a large client that has a huge email subscriber list that they send monthly emails to.  (Yes, they’re opted in so don’t go all *spammer* on me).  Occasionally they like to do targeted blasts based on zipcode (for those members we actually have a zipcode for).  This past month’s blas had over 28,000 zipcodes in the target list.  Due to an oversight, I wound up with a list of zipcodes that overlapped.  Basically it came down to having two files, one of 132277 lines and another with 113035 lines.  In these two files were approximately 30,000 or so overlapping email addresses that would have received both the targeted and non-targeted blasts had I not caught it.  *OUCH* that would NOT have been good.

I decided to parse through the two files with python since its syntax is fresh in my mind, and I’d have wound up googling too much had I decided to do it in bash and this was time-sensitive stuff.  So I busted out vi and coded up the following code (don’t laugh):

INFILE1 = 'all-email.lst'
INFILE2 = 'nofp-list.txt'

destinations = []
dupes = []

for line in open(INFILE1, 'r').readlines():
    for line2 in open(INFILE2, 'r').readlines():
        if line != line2:
            print line2
            destinations.append(line2)
        else:
            dupes.append(line2)

This code subsequently hung my machine as it struggled to loop over so much. I knew this wasn’t gonna be the final version as I was writing it — I had to get my creative juices flowing first — but really didn’t expect it to hang my machine. I had to power off my machine and then revise the code once my system came back up. After some initial tweaks I had this (thinking that it was just too much to read all that into memory, and failing to see the real problem for what it was –that nested for loop):

import fileinput

INFILE1 = 'all-email.lst'
INFILE2 = 'nofp-list.txt'

destinations = []
dupes = []

for line in fileinput.input([INFILE1]):
    for line2 in fileinput.input([INFILE2]):
        if line != line2:
            print line2
            destinations.append(line2)
        else:
            dupes.append(line2)

This didn’t work either as it was giving me an input already open error. Rather than investigate further, I turned around and ultimately ended up with this, and thought to myself “that was dumb I KNOW better than that”:

INFILE1 = 'all-email.lst'
INFILE2 = 'nofp-list.txt'

destinations = []
dupes = []

list1 = open(INFILE1, 'r').readlines()
list2 = open(INFILE2, 'r').readlines()

for line in list2:
    if line in list1:
        dupes.append(line)
    else:
        print line
        destinations.append(line)

*DUH* read them both into a list and use python’s “in” syntax to look for one in the other. Done. Now, I’m positive there’s even more iterations of this code that it could have eventually evolved into but this got the job done and didn’t suck my machine’s resources. Since it was a one-off — I stopped here. (NOTE: the code might not be exactly as I had it since this is largely from memory.)

In any case — its situations like these that I both despise and enjoy at the same time. I despise it because it should be easy and am embarrassed by the fact that I my first revision was so *dumb* as if it lacked any thought. But I enjoy them because its a finite problem to solve and allows me to exercise my brain a bit. Being someone who is starting to spend less time coding and more time in meetings, it feels good to do these exercises.

CLEARLY I’ve uncovered the need for me to code up a better process for doing these targeted blasts as it would appear they are going to be doing more and more of them. I look forward to writing that code so I don’t have to go through something like this that should have been so very elementary! * Embarrassing!*

I invite you share your approach for such a situation, A) so I can learn more from it and B) to find out if I’m really that far off anyway…

:wq!