Categories
Django Python Uncategorized

Getting around memory limitations with Django and multi-processing

I’ve spent the last few weeks writing a data migration for a large high traffic website and have had a lot of fun trying to squeeze every bit of processing power out of my machine. While playing around locally I can cluster the migration so it executes on fractions of the queryset. For instance.

./manage.py run_my_migration --cluster=1/10
./manage.py run_my_migration --cluster=2/10
./manage.py run_my_migration --cluster=3/10
./manage.py run_my_migration --cluster=4/10

All this does is take the queryset that is generated in the migration and chop it up into tenths. No big deal. The part that is a big deal is that the queryset contains 30,000 rows. In itself that isn’t a bad thing, but there are a lot of memory and cpu heavy operations that happen on each row. I was finding that when I tried to run the migration on our Rackspace Cloud servers the machine would exhaust its memory and terminate my processes. This was a bit frustrating because presumably the operating system should be able to make use of the swap and just deal with it. I tried to make the clusters smaller, but was still running into issues. Even more frustrating was that this happened at irregular intervals. Sometimes it took 20 minutes and sometimes it took 4 hours.

Threading & Multi-processing

My solution to the problem utilized the clustering ability I already had built into the program. If I could break the migration down into 10,000 small migrations, then I should be able to get around any memory limitations. My plan was as follows:

  1. Break down the migration into 10,000 clusters of roughly 3 rows a piece.
  2. Execute 3 clustered migrations concurrently.
  3. Start the next migration after one has finished.
  4. Log the state of the migration so we know where to start if things go poorly.

One of the issues with doing concurrency work with Python is the global interpreter lock (GIL). It makes writing code a lot easier, but doesn’t allow Python to spawn proper threads. However, its easy to skirt around if you just spawn new processes like I did.

Borrowing some thread pooling code here, I was able to get pretty sweet script running in no time at all.

import sys
import os.path
 
from util import ThreadPool
 
def launch_import(cluster_start, cluster_size, python_path, command_path):
    import subprocess
 
    command = python_path
    command += " " + command_path
    command += "{0}/{1}".format(cluster_start, cluster_size)
 
    # Open completed list.
    completed = []
    with open("clusterlog.txt") as f:
        completed = f.readlines()
 
    # Check to see if we should be running this command.
    if command+"\n" in completed:
        print "lowmem.py ==> Skipping {0}".format(command)
    else:
        print "lowmem.py ==> Executing {0}".format(command)
        proc = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        output = proc.stdout.read() # Capture the output, don't print it.
 
        # Log completed cluster
        logfile = open('clusterlog.txt', 'a+')
        logfile.write("{0}\n".format(command))
        logfile.close()
 
 
if __name__ == '__main__':
 
    # Simple command line args checking
    try:
        lowmem, clusters, pool_size, python_path, command_path = sys.argv
    except:
        print "Usage: python lowmem.py <clusters> <pool_size> <path/to/python> <path/to/manage.py>"
        sys.exit(1)
 
    # Initiate log file.
    if not os.path.isfile("clusterlog.txt"):
        logfile = open('clusterlog.txt', 'w+')
        logfile.close()
 
    # Build in some extra space.
    print "\n\n"
 
    # Initiate the thread pool
    pool = ThreadPool(int(pool_size))
 
    # Start adding tasks
    for i in range(1, int(clusters)):
        pool.add_task(launch_import, i, clusters, python_path, command_path)
 
    pool.wait_completion()

Utilizing the code above, I can now run a command like:

python lowmem.py 10000 3 /srv/www/project/bin/python "/srv/www/project/src/manage.py import --cluster=" &

Which breaks the queryset up into 10,000 parts and runs the import 3 sets at a time. This has done a great job of keeping the memory footprint of the import low, while still getting some concurrency so it doesn’t take forever.

Categories
Django Javascript Python

Two Frameworks

The past couple of months have found me working diligently on work stuff, but also consistently dropping an hour a day on my current side project.  It just so happens that the side project and my actual work share the same language (Python) and framework (Django).  This has been nice because it’s given my brain a moment to relax with regards to learning new material,  but at the same time I feel stagnate.

Django is my framework of choice.   I know it inside and out, can bend it to my will, and work extremely fast in it.  However I’m not blind to the fact that the popularity of the old monolithic frameworks(Rails, Django, Cake, etc) for new projects is waning.  People these days are starting new projects with a service oriented architecture in mind.  They’re using Node.js with Express on the backend for an API, and then Angular on the front end to create a nice single page app.  I’ve done this sort of development before extensively, but I’m out of practice.  So I’ve come to a fork in the road.

Over the years I’ve come to realize that I can only hold two frameworks in my mind at one time. It doesn’t matter if they are written in different languages or not (those seem to stick with me easier for some reason), but two frameworks is the max I can handle.  So my choices are as follows:  1) Learn Android, 2) Get good at Node.

I’ve made one Android app before when I worked at a marketing firm.  It was fun.  I enjoyed not doing web stuff for once.  I found Java overly verbose,  but as long as you stayed within the “modern Java” lines it was fine.  As for Node, I already know it but I’m just out of practice.  I feel like it would be valuable to become an expert in but sometimes I feel burnt out on the web.

After a lot of deliberation, I think I’m going to move forward with Android development by making an Android app for RedemFit.  It’ll give me a chance to break out of Web development for awhile and hopefully will become something I enjoy doing as much as web.

Categories
Clojure

Learning Clojure: Part 2

In part 1 of my “Learning Clojure” series, I created a simple program to calculate salary based on how many years someone worked. For this post, I’m going to be attempting something a bit more complicated.

Project Gutenberg

One of my favorite websites in the entire world is Project Gutenberg(PG). PG is an archive of books that have passed into the public domain, which makes is a great resource for text mining data. I use it almost every time I need some words to parse, and you should too! So why does this matter right now? I’m glad that you asked.

Outline

Given how simple the last program was, I decided that I should probably take this one up a notch. Its going to involve fetching a file, writing it to disk, reading the file, and processing command line args. In order, here’s what the program needs to do:

  1. Validate command line args – We’re going to accept two arguments. The word that should be counted and a url that points to a .txt file at Project Gutenberg my web host (Project Gutenberg doesn’t like crawlers apparently) for processing.
  2. Download the file – It could be large and might fail. We’ll need to be careful here.
  3. Split the file into a vector – Split the file up on ” ” and load it into a vector.
  4. Print – Print to standard output how many words were found. If none, make it known.

The Program

The full source code can be found at https://github.com/vital101/learn-clojure-wordcount. It looks pretty simple, but I did expand my Clojure knowledge quite a bit with this one. Some of the things I did:

  • Used a 3rd party library
  • Messed around with vectors (split word data) and sequences (args).
  • Wrote to a file.
  • Refactored constantly

I do want to highlight one bit of code that I wrote, because its pretty straight forward but does a lot of stuff.

(defn process
    [url word]
    (write-file (get-source-file url))
    (log "Info" "Processing File...")
    (let [data (cljstr/split (slurp filename) #"\s+")]
        (log "Result" (str (count (filter #{word} data)) " occurrences of '" word "'"))))

My next program needs to be more complicated from a data perspective, so that I’m forced to use things like “map”, “reduce”, and other functional elements on data sets.

Categories
Clojure Other Programming

Learning Clojure: Part 1

Clojure is a functional programming language based on Lisp and written to run on top of the JVM. I’ve tried learning it in the past, but have failed mostly due to biting off more than I could chew. But not this time! I’m taking my time, reading lots of code, and doing 1st year computer science assignments with it. I figure this worked well when I first learned how to program, so it will probably work well now.

The Return to Trivial Programs

After spending the past 6 years neck deep in non-trivial professional programming, I’m returning to trivial toy programs to learn Clojure. My first task is to write a program that takes user input from the terminal and calculates their salary at a year which they input. More specifically:

  • Starting salary is $1000
  • Salary doubles every year
  • Validate input to make sure it is a number.
  • Write history to file called: salary_history.txt
  • In format…. [years_working]:$[salary]

All in all its pretty straight forward. I currently could write this program in a handful of different languages (Python, PHP, Java, Javascript [Node], Ruby), but am struggling with one bit of the Clojure implementation.

(ns salary.core
  (:gen-class))
 
(defn get-integer
    "Returns a string in integer form, else false."
    [input]
    (try
        (#(Integer/parseInt %) input)
        (catch Exception e false)))
 
;; Incomplete.  Will eventually write to a file.
(defn output
    "Takes the console input and error message and outputs them to file and console."
    [console-input message]
    (println (str console-input ": " message)))
 
;;
;; ????? WTF DO I DO HERE
;;
(defn calculate-salary
    [years]
    ())
 
(defn -main
  [& args]
  (println "How many years do you want to work?")
  (let [user-input (read-line)]
    (let [years (get-integer user-input)]
        (if years
            (calculate-salary (- years 1) 1000)
            (output user-input "This is NOT an integer.")))))

The Python implementation of calculate salary would look something like this:

def calculate_salary(years):
    salary = 1000
    for i in range(years-1):
        salary = salary * 2
    return salary

But in Clojure things are bit more complicated. In Clojure values are immutable. I can’t just loop over the years and keep doubling the salary while storing it in the same variable. I need to use recursion. Or reduce. Or map. Hell, I don’t know. I need to use something functional, lest I want the Clojure experts to laugh at me. I need something that will call a function that doubles whatever value comes into it, then returns. Then I need to call said function up to N times (where N is the number of years that the person enters).

Any ideas?

EDIT
With the help of Ryan (below), I came up with:

(defn calculate-salary
    [years salary]
    (if (= years 0)
        salary
        (calculate-salary (- years 1) (* salary 2))))
Categories
Other

Shuttering Side Projects

Over the past few years I’ve slowly accumulated some big side projects. They weren’t done for clients, but just for myself. At some point maintenance of these side projects isn’t fun anymore and hinders the creative juices. I have other things I want to work on, but having these other zombie side projects feels too much like an albatross around my neck.

After much deliberation, I’ve decided to shut down two of my large side projects: BookCheaply and Smooth Bulletin. I really believe both of these projects could do someone some good, but they were both learning projects for me and I don’t see them moving forward anymore. Effective immediately I’m disabling their Apache configs, backing up their DBs, TARing it all together, and putting it somewhere safe. I’ll keep access to the Git repos, but eventually I’ll clone a copy of those out too and archive them. If I don’t get them out of the way completely, I feel like I’ll want to work on them too much.

Shuttering these projects marks a transition for me, where I move from using Python and Django on side projects to Node.js, Express, and Angular. While apprehensive about abandoning my go-to stack for side projects, I’m excited to learn the nooks and crannies of Node (and I still use Python/Django for my day job anyways).

Here’s to the future!

Smooth Bulletin

Categories
Other

Restoring a 14 year old website

If you want to skip the restoration stuff, the final product can be found at https://re-cycledair.com/starwars/Entrance.html.

A long time ago in an era just before the first dot-com bust, I was a bright-eyed child with aspirations to make the greatest Star Wars web site of all time. With only Microsoft Front Page Express and a limited knowledge of HTML in hand, the 1999 version of me created the best Star Wars web site the world had ever seen.

Unfortunately I was hosting it on my ISPs free web hosting area. When that ISP was later gobbled up by another company, all traces of my website were gone… or so I thought. Recently I discovered that most of it still exists on the Way Back machine, however it’s in poor shape. Many of the assets don’t exist anymore, the linking is broke, and the frames just don’t work right in modern browsers. This is my journey to save this little piece of history while using minimal modern techniques. My goal is to have a legitimate 1999 website when I’m done and have it render nicely on most modern web browsers.

The Internet Archive

The Wayback machine at the Internet Archive has long been one of my favorite websites. I love getting nostalgic for the web of the past, my childhood, and history in general. It was awhile ago that I realized it still had some of my original Star Wars website. If you want, check out the capture from 2001. It doesn’t work much at all, but it gives you a good idea of what we’re working with here.
Star Wars Restoration - Before

In addition to the main page I also had an “Entrance” page, because you know, every cool web site had an entrance page back then. This one is in slightly better shape, meaning the star field background survived intact.
Star Wars Restoration - Entrance Before

The Game Plan

My goal for this restoration is to make the website display great in modern web browsers with minimal code changes and without using too many modern techniques. At that point in time, CSS wasn’t really a thing that a ton of people used yet, and even if it was it was beyond my understanding, so I’m going to try and avoid it if at all possible. Javascript was available in most browsers by that time, but I didn’t understand it, so I didn’t use it. However, we do have FRAMES(!), which is going to make things super exciting for everyone still reading.

Now that we’ve laid down some ground rules, here is the order of attack:

  1. Entrance: Fix / replace broken images.
  2. Entrance: Clean up the HTML (alignment) and remove anything that the Internet Archive crawler added.
  3. Main site: Fix / replace broken images
  4. Main site: Remove dead links. Sadly the Internet Archive couldn’t capture everything.
  5. Main site: Fix any styling issues that remain.
  6. Main Site: Clean up HTML (alignment) and remove anything that the Internet Archive crawler added.

Lets get started!

Entrance: Replacing Broken Images

While it’s fantastic that the background star field has survived, the rest of the images didn’t. This is sort of a problem because I have to reach pretty far back into my memory to figure out what was there originally. Luckily, 1999 me wasn’t completely terrible at naming things.

Star Wars Restoration - Entrance File Names 1

From these file names and memory, I’m going to say that `tie.gif` was a rotating TIE fighter gif, and `starwars.gif` was the Star Wars logo that survived on the main section of the site.

Star Wars Restoration - Entrance File Names 2

And from this file name we get nothing. Luckily I remember this being an image of Tatooine, roughly 350px x 100px. This site came out a bit before Episode I, so I’m going to assume it was an image from one of the Episode I trailers. At least, thats how I remember it.

After a bit of digging, I found replacement images for everything but the Tatooine banner. In it’s place, I found a light saber gif that I definitely used on this site somewhere in the past. I also updated the star field to be a little more 1999.

Star Wars Restoration: Entrance Complete

Entrance: HTML Clean Up

Now that the entrance is starting to look like a real 1999 web site again, we can start to clean up the HTML and fix the broken link to enter the site. The entrance area contains 3 pages, 1 to bring the frames together, and 1 for each frame. They currently look like this:
entrance_html_before

entrance_banner_before

entrance_intro_before

The code isn’t too bad. With a bit of alignment help and some nesting, things are going to look pretty good. Now check out the final product:

entrance_banner_after

entrance_html_after

entrance_intro_after

Main Site: Fix Broken Images

Now that we have the entrance out of the way, I’ve have a pretty good idea of how this whole restoration will work for the rest of the site. Let’s start by fixing the background on the main site so things are a bit more workable.

Star Wars Restoration: Main site with background

Now we’re talking! There are few things that I noticed immediately after getting the background set.

  • 1999 me had some pretty sweet design skills
  • I should use more animated gifs on my websites
  • The frame content on the left isn’t staying inside of it correctly, so I had to scroll to the center to get it to work.
  • I had a Yahoo! club!
  • I wasn’t so great with words back then. (I was a kid, give me a break)

Looking at the source, there are a few images that aren’t linking quite right.

  • Left side: starwars.gif – I think this was just a smaller version of the big logo.
  • Left side: deathstar.gif – Definitely a gif of the Death Star. It might have been animated, but I think it was just transparent so that it looked awesome.
  • Right side: lettersabove.gif – I honestly don’t know, but it might have been “A long time ago in a galaxy far far away…”
  • Right side: pulselightsaber.gif – A pulsing light saber page break. I might just use the static one from the entrance instead.
  • Right side: xwing.gif – A rotating X-Wing fighter. Because nothing says “Email me!” like a rotating X-Wing.

After tracking down many of the original images that I used here, the home page now looks like this.

Star Wars Restoration: Home Page Finished

Looking good! Next up, code clean up.

Main Site: Code Clean Up

The code for the main site is in pretty poor condition. There is a ton of code in there from the Internet Archive, it has poor indentation, and in general just isn’t anywhere near the quality of the entrance. Given the amount of code here, I thought using a Github Gist would be better.

Before Code: https://gist.github.com/vital101/56b2783b8e75b00fdac9

After Code: https://gist.github.com/vital101/2d1e6a44ac3ddbb1789d

As you can see I went ahead and re-aligned everything so that it’s more readable, took out all of the javascript and css added by the Internet Archive, and condensed the code a little bit so there wasn’t large bits of whitespace between elements.

The next step in the code cleanup is fixing URLs. If you look at the code, you’ll notice that all the links still point to the Internet Archive. That obviously isn’t going to work for us, so I simply need to replace “https://web.archive.org/web/20010811043323/http://my.voyager.net/~hands/star-wars/” with an empty string, making the link relative.

After fixing links, I found that many of the sub-pages have been lost to time, but there are still a few for us to look at including:

  • Luke Skywalker
  • Darth Vader
  • Princess Leia
  • Chewbacca
  • G.Moff Willguf Tarkin
  • Star Wars: Rogue Squadron
  • Weapons
  • B-Wing
  • Snowspeeder
  • Corellian Corvette
  • Mon Calamari Cruiser
  • At-AT
  • Death Star
  • Chat
  • Movie

Fixing up these pages follows a similar pattern to the entrance page and the main site, so I’ll spare you the details. There were a few interesting things that popped up while I was doing the sub-page restoration though.

  • Plagiarism – 1999 me definitely ripped off somebody else with most of the encyclopedic data on characters, weapons, and ships. The change in voice is a dead giveaway.
  • Front Page Versions – It appears that I made the entrance and the menu/main page of the main site in FrontPage 3.0, but all of the sub-pages in FrontPage 2.0 (express). FrontPage Express was shareware that came bundled with IE 4 so I know how I got that. I believe I had access to FrontPage 3.0 at my school at that time, which I probably used to generate the frames.
  • Xoom – I had completely forgotten about it, but Xoom used to be a free unlimited web host back in the early dot-com days. I used it to host videos, games, and other things. Unfortunately my Xoom site wasn’t crawled by the Internet Archive.
  • Java Chat – Back in 1999, there weren’t a whole lot of options for getting a chat room going on your website. Using a Java applet was basically it, so thats what I did. It looks like Xoom offered it’s members a free chat tool, so how could I resist?
  • Game Demo – Ah, the good old days. The one game page that survived has some pretty hefty system requirements: Windows 95/98, 32 MB of RAM, Pentium 166. Direct3D card, DirectX 6. Oh, and I also made sure to let people know the estimated download time on 28.8 kbps dial-up modem.

Final Thoughts

The final product can be found at https://re-cycledair.com/starwars/Entrance.html.

While going through this restoration I realized that this is when I knew what I wanted to do with my life. My parents had no idea what I was doing, but they realized that it stimulated my mind so they let me continue to do it anyways. Without that kind of encouragement and freedom to create, I probably wouldn’t be where I am today. This restoration has also re-kindled my respect for the old web. It was simple, but I’m extremely happy that the web has progressed to where it is today.

Categories
Other

HealthCare.gov: It almost worked

Ever since HealthCare.gov launched all I’ve been hearing about is how horrible the experience is. It never affected me directly, so I chalked it to a few vocal detractors trying to sway public opinion. I even went so far as to go to HealthCare.gov and use the logged-out experience to compare some plans. It looked pretty good actually, and I tweeted as such.

HealthCar.gov Tweet

That was all about to change though. I recently made the move to work at a small startup that doesn’t yet have an employer-sponsored health plan. They do however offer a stipend of sorts to help you pay for insurance costs on HealthCare.gov. And with that, my journey began.

Initial Sign Up

After discussing the healthcare situation with my wife, I ran the numbers and decided that it would be best for her to use her employer-sponsored healthcare and for me to use the exchange separately. At this point everything went as well as it could. Even though the UX on HealthCare.gov is pretty bad, I was able to figure out how to fill out an application, select health coverage, select dental coverage, and get things rolling. This all happened before December 23rd, which means I’d be getting coverage by January 1st! All was well… until I had to make a change.

The Change

As it turns out, I math’d wrong. When I was calculating the cost of healthcare for my wife at her job, I was off by something like $200 per month, which pushed her healthcare costs well into the “unreasonable for what I’m going to receive” range. With that in mind, I thought I could make a quick change on HealthCare.gov to get her on the same plan that I was. Boy was I wrong.

The process I had to take to get both my wife and myself health and dental insurance was ridiculous, and still hasn’t worked correctly. To add my wife to my plan, I had to delete the entire thing. Not only that, but I needed to delete my entire application and start over from scratch. Once I did that, I kept constantly running into javascript errors in their application which would force me to re-authenticate.

After finally getting through the application process, I get to the last step where I need to confirm. I press the button and… nothing. Not a goddamn thing. Being a software engineer, I do a little investigating and find out that the server is returning 500 errors to the client (in non-developer speak, this means that the server couldn’t process my request because of some error on their side). The error that is returned says that I should try to log-out, then resume the application. I do this, but when I try to resume my application I get:

Thats right, my application is locked. It doesn’t give any reason, which is especially confusing considering that I was asked to re-authenticate. But there is a little link that can “explain this task”, so I click it, and it naturally didn’t work.

At this point, I was beginning to realize that I would need to contact their support to get things resolved.

Support and Next Steps

The first time I tried to use support, I clicked the helpful little “Live Chat” button. I then proceeded to wait for 25 minutes with no response. This was over the holidays, so I gave it another try after Christmas. This time I was connected to somebody, but no matter what question I asked I was given a canned response to contact the call center. My question is: Why have the “Live Chat” at all if you’re just going to tell me to contact the call center? It’s ridiculous and a waste of my time.

After my initial experience with HealthCare.gov, I’m not really excited to contact the call center. I honestly don’t have a ton of time to do it, and given the number of people that will be signing up for healthcare I’m sure the wait will be long. My likely next steps will be to cancel what remains of my application and start over. If it doesn’t work this time, then I’m going to cancel that application and contact my insurance provider of choice directly. In my case thats Blue Cross Blue Shield of Michigan, which happens to have a walk-in office about a block from where I work.

I create things all the time on the web. Its my chosen profession, so I know how hard it can be to make a good website when you need to integrate with a lot of different 3rd parties. However this sort of experience isn’t going to cut it. If I’m federally mandated to have health insurance, then the experience should be as painless as possible.

Because the system is broke I won’t have health coverage until February 1st now; a month long gap in coverage for my family. This just isn’t acceptable.

Categories
Other

Why I Still Have Faith in Hacker News

If you browse Hacker News enough you’ll start to see article trends.

  • Startup X is Startup Y for Dogs
  • Why you should use framework Z over formerly hot framework B
  • Cool Javascript demo is N lines of code
  • How I failed at A and it made me better at B.

Don’t get me wrong, some of this stuff is interesting, but most of it is just noise to me. The reason I come to HN is for the comments. Thats where the good stuff is. HN is one of the brightest internet communities I’ve ever had the pleasure to interact with. For instance, today I posted to HN asking for the community to review my startup Smooth Bulletin. Within an hour I had two very thoughtful comments about business and market ideas that I hadn’t even thought of. I expect I’ll probably get even more feedback as the day goes on, and that I’ll benefit from those just as much as the first comments.

Thats why I still have faith in Hacker News. Not because of the content, but because of the people. The people there are incredibly smart and willing to give their advice just so that maybe you can succeed one day. I’ve been a member of HN for several years now and this hasn’t changed at all since the time I joined, and I hope that it never does.

And yes, I posted this to Hacker news, so maybe I should add “Why I still have faith in X” to the list.