I’m Embarassed

I’m embarrassed!  Every now and then — okay, quite frequently — I’m reminded that when I started my I.T. career I had no interest in being a developer.  It wasn’t until I got sick of getting certified in this technology or the other, and being married to a pager that I decided I wanted to learn how to be a developer.  I had no official training and am pretty much “self taught” — and at times, I feel that it shows through.

For example, I have a large client that has a huge email subscriber list that they send monthly emails to.  (Yes, they’re opted in so don’t go all *spammer* on me).  Occasionally they like to do targeted blasts based on zipcode (for those members we actually have a zipcode for).  This past month’s blas had over 28,000 zipcodes in the target list.  Due to an oversight, I wound up with a list of zipcodes that overlapped.  Basically it came down to having two files, one of 132277 lines and another with 113035 lines.  In these two files were approximately 30,000 or so overlapping email addresses that would have received both the targeted and non-targeted blasts had I not caught it.  *OUCH* that would NOT have been good.

I decided to parse through the two files with python since its syntax is fresh in my mind, and I’d have wound up googling too much had I decided to do it in bash and this was time-sensitive stuff.  So I busted out vi and coded up the following code (don’t laugh):

INFILE1 = 'all-email.lst'
INFILE2 = 'nofp-list.txt'

destinations = []
dupes = []

for line in open(INFILE1, 'r').readlines():
    for line2 in open(INFILE2, 'r').readlines():
        if line != line2:
            print line2
            destinations.append(line2)
        else:
            dupes.append(line2)

This code subsequently hung my machine as it struggled to loop over so much. I knew this wasn’t gonna be the final version as I was writing it — I had to get my creative juices flowing first — but really didn’t expect it to hang my machine. I had to power off my machine and then revise the code once my system came back up. After some initial tweaks I had this (thinking that it was just too much to read all that into memory, and failing to see the real problem for what it was –that nested for loop):

import fileinput

INFILE1 = 'all-email.lst'
INFILE2 = 'nofp-list.txt'

destinations = []
dupes = []

for line in fileinput.input([INFILE1]):
    for line2 in fileinput.input([INFILE2]):
        if line != line2:
            print line2
            destinations.append(line2)
        else:
            dupes.append(line2)

This didn’t work either as it was giving me an input already open error. Rather than investigate further, I turned around and ultimately ended up with this, and thought to myself “that was dumb I KNOW better than that”:

INFILE1 = 'all-email.lst'
INFILE2 = 'nofp-list.txt'

destinations = []
dupes = []

list1 = open(INFILE1, 'r').readlines()
list2 = open(INFILE2, 'r').readlines()

for line in list2:
    if line in list1:
        dupes.append(line)
    else:
        print line
        destinations.append(line)

*DUH* read them both into a list and use python’s “in” syntax to look for one in the other. Done. Now, I’m positive there’s even more iterations of this code that it could have eventually evolved into but this got the job done and didn’t suck my machine’s resources. Since it was a one-off — I stopped here. (NOTE: the code might not be exactly as I had it since this is largely from memory.)

In any case — its situations like these that I both despise and enjoy at the same time. I despise it because it should be easy and am embarrassed by the fact that I my first revision was so *dumb* as if it lacked any thought. But I enjoy them because its a finite problem to solve and allows me to exercise my brain a bit. Being someone who is starting to spend less time coding and more time in meetings, it feels good to do these exercises.

CLEARLY I’ve uncovered the need for me to code up a better process for doing these targeted blasts as it would appear they are going to be doing more and more of them. I look forward to writing that code so I don’t have to go through something like this that should have been so very elementary! * Embarrassing!*

I invite you share your approach for such a situation, A) so I can learn more from it and B) to find out if I’m really that far off anyway…

:wq!

It’s the small stuff…

 

I know, I know! It’s been a very long time — I tell myself its because I only speak when I have something to say.  Whatever gets you by right?

Anyway, over the past few weeks (very much off and on, more off than on) I’ve been working on an algorithm for building a schedule of teams for a league manager.  What started out as something that I thought should have been very simple turned out to be much more complicated that I’d originally thought.  I remember thinking “2 hours tops” — ever have one of those moments when you realize that you clearly weren’t thinking when you opened your big fat mouth?

Sure, some of you could probably write this in your sleep, but I did some googling, asked around a bit etc, and as it turns out, its not as easy as you might think.  (Just google for league scheduling algorithm — better yet — let me google that for you).  I think you’ll find that there are more lines of code than one would think and a number of people attempting to roll their own that have come running for help.

I found several good examples, but with many of them, the code wasn’t very reader-friendly.  I found myself down so many dead ends and just about the time I thought “this is it, I’ve got it” I’d fall flat on my face again.

I walked away.  (I did it on paper which turned out to pretty easy and how I came up with an algorithm that I was (and still am) sure I could use).  However it bothered the living daylights out of me that I couldn’t solve the problem. I eventually left my afore mentioned algorithm behind and though I believe it’d be way more efficient and someday plan to figure out how to get it working, in the interest of getting something done — one of my mantras — I moved forward with an approach that I was making progress with.  At long last, I had a schedule builder that I can use for the bocce league that I’ve been made captain of recently.

For those of you looking for code — I’m not going to show it for numerous reasons:

  1. Part of me still believes this should have been much simpler and frankly I’m embarrassed
  2. This sounds like it could be someone’s homework assignment and I’m not about to be giving any answers
  3. I may wind up using it in some actual software I’m working on

Overall, it comes down to be remembering to go back to the basics:

  1. Start small, solve only small problems.  If you feel its a big problem, break it down into its smallest chunks and solve each chunk as its own problem.
  2. Use pseudo-code — my oh my when did I ever stop pseudo-coding things?  So very helpful to break down the problem domain.  (Note to self, is there a possible software solution here….hmmmm)
  3. Comment the living crap out of your code.  When you walk away for a bit or get interrupted, you’ll be thankful.  Many people say good code comments itself.  I say “nay nay.” When you are solving an issue you’ve never solved before and had to google a few things, you wind up writing obscure bits of code or syntax that you’ll no doubt forget about and forget what purpose it served or what functionality it provided.   Comments are still a very good thing.  Anyone telling you different is plain wrong.
  4. Don’t be afraid to start over or delete code.
  5. Don’t be afraid to write something procedurally, at least at first.  My solution was very much done procedurally.  It was not OO or functional.  After all, an algorithm is a recipe.  A recipe is very procedural. Writing something procedural to solve a problem helped me a great deal in this case and provided me with a clearer understanding of the problem at hand and its solution.

Again, this is probably a very simple problem for some of you.  For some reason, the solution eluded me for much longer that I’d like to admit.  There is still one strange kink that I have to wrap a try/except block around and thats on my list to work out, (as well as providing support for bye-weeks) but at last its refreshing to see a schedule actually being built.

I’m really curious to see if anyone else has worked on this problem, or if anyone else would like to try working it out for themselves in their language of choice and them come back and provide their insight/feedback etc.  I don’t expect you to show code since I didn’t even do that myself, but I’d love to hear your opinion on how easy/hard you found this to be.  In the end, I fully expect to be an idiot here and its something very simple that just eluded me and if thats the case, I’m fine with it — sorta.  But I’d love to find out if anyone else underestimated the issue.  So if you wanna give it a try, here are the rules I had to work with:

  1. X number of teams over N  number of weeks.
  2. In my league, each team plays each other only once.  (This was the sticking point here, we had 8 teams, they wanted an 8-week season which means there has to be a bye-week, or you just play a 7-week season.  Since they wanted 8 weeks, I have to add bye support in there.  I’m still working on that.  That being said, if you have 16 teams, you need at least a 15-week season, or 16 weeks and everyone has one bye week.  Or you can have 8 teams, play each other twice for 14 weeks with two bye’s, etc which is something else I haven’t worked out yet but plan on doing).

There are a ton of edge cases, so let’s stick with the following:  8 teams, 7-weeks, everyone plays each-other once.  Man…even saying it again makes it sound so simple but yet I found it so much more difficult that I expected.  Wasn’t the hardest code I’ve ever had to come up with by a long shot, but was much more painful that I thought it should have been.  Any takers?

 

:wq!