Note from Scott: if you already know Python, you’re probably going to find this post incredibly boring, unless you’re interested in learning how a non-Python coder approaches figuring out a problem in Python. Everyone else, keep reading!

Several weeks ago I started backing up my tweets using IFTTT & Dropbox, as detailed previously in Archiving your past tweets as Markdown with a good text editor & a few simple bash commands. Since then, Twitter’s increasing dickishness meant that IFTTT had to kill all of its recipes that grabbed things out of Twitter as of September 271.

So what now?

After a bit of reading, I decided to use Dr. Drang’s Python-based script on my Mac to download & archive my tweets. His original blog post—Archiving tweets without IFTTT—is helpful, but what you really want is his updated code at GitHub. I downloaded it, read through it, got things set up, & gave it a whirl. It worked! With one little problem.

While I was using IFTTT, I had been using Brett Terpstra’s example2 to format the resulting Twitter archive with Markdown, because I loves me some Markdown. The results looked like this:

"Most compelling thing about science: it’s always ready to improve by casting out old ideas when evidence demands it." http://t.co/e9jQ1fky 

[September 23, 2012 at 09:57AM](http://twitter.com/scottgranneman/status/249885329421647872) 

--- 
 
"Once you say that you have faith, and that it comes directly from God, there is no self-correction mechanism." http://t.co/7oiJJ7Ep 

[September 23, 2012 at 10:06AM](http://twitter.com/scottgranneman/status/249887526255480834) 

--- 

The format was this:

tweet-text
blank-line
[tweet-date](tweet-url)
blank-line
---
blank-line

When I used Dr. Drang’s script, however, the results looked like this:

Fascinating article about collecting opium antiques & then becoming an opium addict: http://t.co/0Un3Wbdo Lots of interesting pictures too.
September 30, 2012 at 2:43 PM
http://twitter.com/scottgranneman/status/252493821378707456

- - - - -

Why can’t we tickle ourselves? Why do we tickle each other? Answers that I hadn’t thought of: http://t.co/XzxXXVrD
September 30, 2012 at 2:47 PM
http://twitter.com/scottgranneman/status/252494869262651392

- - - - -

Because his formatting was:

tweet-text
tweet-date
tweet-url
blank-line
- - - - -
blank-line

My tweet archive now read & looked totally different, with no Markdown. I didn’t like that, so I wanted to change the script so that it formatted things in the Terpstra way, like I had for the previous 3600+ tweets. But what exactly do I change? I don’t know the first thing about Python, other than it’s a powerful, incredibly useful language. But I also don’t know JavaScript, & that’s never stopped me from mucking around in various scripts I’ve downloaded & used. The sleeves were rolled up, & I dove in.

Pretty quickly I realized that these were the lines I needed to alter:

# Write them out to the twitter.txt file.
with open(tweetfile, 'a') as f:
    for t in reversed(tweets):
        ts = utc.localize(t.created_at).astimezone(homeTZ)
        lines = ['',
                t.text,
                ts.strftime(datefmt).decode('utf8'),
                urlprefix + t.id_str,
                '- - - - -',
                '']
        f.write('\n'.join(lines).encode('utf8'))
        lastID = t.id_str

Specifically, the stuff between the square brackets. Clearly, those were the individual pieces—the tweet itself (t.text), the date & time of posting (ts.strftime(datefmt).decode('utf8')), the URL pointing to the tweet (urlprefix + t.id_str), the separator ('- - - - -'), & a blank line ('')—that were joined together with f.write('\n'.join(lines).encode('utf8')) to create each tweet in the archive. It seemed obvious to me that the primes (AKA, straight apostrophes: ') were used to indicate a literal string of text, either an empty line ('') or a separator (- - - - -)3.

I started with an easy one: I changed '- - - - -' to '---', saved it, tweeted, & ran the script. It worked!

Now I wanted to add a blank line between the tweet & the date, & between the URL & the separator. Fingers crossed, I tried this:

lines = ['',
        t.text,
        '',
        ts.strftime(datefmt).decode('utf8'),
        urlprefix + t.id_str,
        '',
        '---',
        '']

Saved, tweeted, ran it. Success!

Now for the hard one. How do I change this code:

ts.strftime(datefmt).decode('utf8'),
urlprefix + t.id_str,

So that instead of ouputting this:

September 30, 2012 at 2:47 PM
http://twitter.com/scottgranneman/status/252494869262651392

It would ouput this:

[September 30, 2012 at 2:47 PM](http://twitter.com/scottgranneman/status/252494869262651392)

It took a while to figure this out. Eventually, after much Googling & reading, I figured out what I was trying to figure out. You laugh, but that’s often the first thing you have to do to solve a problem, especially when you’re brand new to something: figure out what it is you need to figure out.

Before I got there, I tried this (& various permutations):

lines = ['',
        t.text,
        '',
        '['ts.strftime(datefmt).decode('utf8')']',
        urlprefix + t.id_str,
        '',
        '---',
        '']

That didn’t work. When I ran the script, it either failed with SyntaxError: invalid syntax or TypeError: sequence item 3: expected string or Unicode, list found. These actually made sense, & were helpful.

Finally, I got it. What was I looking for? Information about concatenating string literals. Once I realized that, & saw how Python handles that, it didn’t take much time to get to this:

lines = ['',
        t.text,
        '',
        '['+ts.strftime(datefmt).decode('utf8')+']',
        urlprefix + t.id_str,
        '',
        '---',
        '']

Saved, tweeted, tested, & got this (yes, I’m reusing the same tweet; I’m just being lazy):

Why can’t we tickle ourselves? Why do we tickle each other? Answers that I hadn’t thought of: http://t.co/XzxXXVrD

[September 30, 2012 at 2:47 PM]
http://twitter.com/scottgranneman/status/252494869262651392

---

Aha! I was right! Time to finish up:

lines = ['',
        t.text,
        '',
        '['+ts.strftime(datefmt).decode('utf8')+']('+urlprefix + t.id_str+')',
        '',
        '---',
        '']

Saved, tweeted, & finally got exactly what I wanted:

Why can’t we tickle ourselves? Why do we tickle each other? Answers that I hadn’t thought of: http://t.co/XzxXXVrD

[September 30, 2012 at 2:47 PM](http://twitter.com/scottgranneman/status/252494869262651392)

---

So, if you want the tweets in your archive to be encoded in Markdown & follow Brett Terpstra’s suggested formatting, you’ll want to change the heart of Dr. Drang’s excellent script from this:

# Write them out to the twitter.txt file.
with open(tweetfile, 'a') as f:
    for t in reversed(tweets):
        ts = utc.localize(t.created_at).astimezone(homeTZ)
        lines = ['',
                t.text,
                ts.strftime(datefmt).decode('utf8'),
                urlprefix + t.id_str,
                '- - - - -',
                '']
        f.write('\n'.join(lines).encode('utf8'))
        lastID = t.id_str

To this:

# Write them out to the twitter.txt file.
with open(tweetfile, 'a') as f:
    for t in reversed(tweets):
        ts = utc.localize(t.created_at).astimezone(homeTZ)
        lines = ['',
                t.text,
                '',
                '['+ts.strftime(datefmt).decode('utf8')+']('+urlprefix + t.id_str+')',
                '',
                '---',
                '']
        f.write('\n'.join(lines).encode('utf8'))
        lastID = t.id_str

That was fun. Maybe it’s time I actually learned some Python!

  1. Of course, recipes that put things into Twitter are still just fine. Dicks. 

  2. Here’s the link from Brett: https://ifttt.com/recipes/43468. I wouldn’t bother going there, though, as it no longer works. As I said, IFTTT was forced by Twitter to remove all recipes that grabbed tweets, so you now get a 404. Like I said, the people that run Twitter are dicks. 

  3. If you’re Python-ignorant like I am, you may be wondering about the 'utf8'. That’s not a literal string; that tells the decode method what character encoding to use. I actually realized that from all the HTML I do, where you always have to specify a character encoding for your webpages.