Wrapping non-ascii characters in Django

One of our long time clients recently hired a very talented design firm to give their site — and brand — a facelift. During said facelift a very strange font was selected that we had to serve up via embedded fonts. It just so happened however that this font for some reason had very large registered trademark symbols. So large in fact that our client mandated that we “find a way” to fix it.

After some consideration, I decided the best way to approach such an issue was to create a custom filter that we could use wherever we were expecting the database to hand us one of these symbols. (Since our database is using UTF-8, we’re able to literally store those symbols directly in the db — or at least that’s how I understand it, and is pretty evident when you perform a select on one of their products).

For the impatient, what I ended up with:

def wrap_symbol(value, classname):
    Looks for registered trademark symbols and wraps them in 
    a css class to allow for styling.

    {{ string|wrap_symbol:'classname' }}

    TODO: allow specification of symbol(s) to replace
classname = smart_str(classname)
repl_text = "<span class='" + classname + "'>\xc2\xae</span>"

    string = smart_str(value).replace('\xc2\xae', smart_str(repl_text))
except ValueError, exp:
    return value
return mark_safe(string)

But my first attempt had the replacement taking place as such:

string = value.replace("\xc2\xae\", "<span class='" + classname + "'>\xc2\xae</span>")

I was repeatedly met with the following error: ‘ascii’ codec can’t encode character 0xca2 in position 0: ordinal not in range(128). While I didn’t expect it to work out of the gate — as things rarely tend to do for me — I was in no way expecting the multi-hour battle that would ensue, (largely due to my lack of understanding of character encodings and some unexpected output from print statements, but I digress) — off to the shell for some experimentation:

>>> from products.models import Product
>>> p = Product.objects.get(pk=25)
>>> p.name
CompanyName\xae ProductName\xae
>>> print p.name
CompanyName® ProductName #(I'm paraphrasing here)
>>> print unicode(p.name)
CompanyName® ProductName
>>> unicode(p.name)
CompanyName\xae ProductName\xae

Had I only noticed the pattern that emerged during the above commands, I mightn’t be writing this post…nevertheless, continuing, I knew that the DB should be giving me back the UTF-8 representation so I hit the great google for guidance. Through various articles and even a trip to #python and #django, I ended up trying something like:

>>> p.name.encode('utf-8')
CompanyName\xc2\xae ProductName\xc2\xae

Aha! Something different — if only slightly — but different enough to send me down another black hole. After many a google search, I came across this which ultimate led me to my final working destination.

Sadly it wasn’t until after I had things working that I realized what was holding me up in my testing. Anytime I called “print” from the shell or from my code, etc — print was converting the output for me. (It was doing it right in front of my eyes, but apparently I refused to believe it.) Investigating using python’s “type()” method, entering type(p.name) definitely returned ‘unicode’ so for the life of me I couldn’t figure out why my debug statements kept having the actual symbol print out, which was further muddying my already cloudy understanding of the issue.

After one final hiccup involving the insertion actual replacement text: \xc2\xae back into the string, I had a working filter that I can now use across my entire site. I’m not entirely sure that I still grasp what exactly what was/is going on, but the of django’s utilities make it so much easier to deal with this sort of thing. Man these guys thought of _everything_.