Backing up your Twitter tweets with Dr. Drang’s Python script, but using Brett Terpstra’s Markdown-based formatting
Note from Scott: if you already know Python, you’re probably going to find this post incredibly boring, unless you’re interested in learning how a non-Python coder approaches figuring out a problem in Python. Everyone else, keep reading!
Several weeks ago I started backing up my tweets using IFTTT & Dropbox, as detailed previously in Archiving your past tweets as Markdown with a good text editor & a few simple bash commands. Since then, Twitter’s increasing dickishness meant that IFTTT had to kill all of its recipes that grabbed things out of Twitter as of September 271.
So what now?
After a bit of reading, I decided to use Dr. Drang’s Python-based script on my Mac to download & archive my tweets. His original blog post—Archiving tweets without IFTTT—is helpful, but what you really want is his updated code at GitHub. I downloaded it, read through it, got things set up, & gave it a whirl. It worked! With one little problem.
While I was using IFTTT, I had been using Brett Terpstra’s example2 to format the resulting Twitter archive with Markdown, because I loves me some Markdown. The results looked like this:
The format was this:
When I used Dr. Drang’s script, however, the results looked like this:
Because his formatting was:
Pretty quickly I realized that these were the lines I needed to alter:
Specifically, the stuff between the square brackets. Clearly, those were the individual pieces—the tweet itself
(t.text), the date & time of posting (
ts.strftime(datefmt).decode('utf8')), the URL pointing to the tweet (
urlprefix + t.id_str), the separator (
'- - - - -'), & a blank line (
'')—that were joined together with
f.write('\n'.join(lines).encode('utf8')) to create each tweet in the archive. It seemed obvious to me that the primes (AKA, straight apostrophes:
') were used to indicate a literal string of text, either an empty line (
'') or a separator (
- - - - -)3.
I started with an easy one: I changed
'- - - - -' to
'---', saved it, tweeted, & ran the script. It worked!
Now I wanted to add a blank line between the tweet & the date, & between the URL & the separator. Fingers crossed, I tried this:
Saved, tweeted, ran it. Success!
Now for the hard one. How do I change this code:
So that instead of ouputting this:
It would ouput this:
It took a while to figure this out. Eventually, after much Googling & reading, I figured out what I was trying to figure out. You laugh, but that’s often the first thing you have to do to solve a problem, especially when you’re brand new to something: figure out what it is you need to figure out.
Before I got there, I tried this (& various permutations):
That didn’t work. When I ran the script, it either failed with
SyntaxError: invalid syntax or
TypeError: sequence item 3: expected string or Unicode, list found. These actually made sense, & were helpful.
Finally, I got it. What was I looking for? Information about concatenating string literals. Once I realized that, & saw how Python handles that, it didn’t take much time to get to this:
Saved, tweeted, tested, & got this (yes, I’m reusing the same tweet; I’m just being lazy):
Aha! I was right! Time to finish up:
Saved, tweeted, & finally got exactly what I wanted:
So, if you want the tweets in your archive to be encoded in Markdown & follow Brett Terpstra’s suggested formatting, you’ll want to change the heart of Dr. Drang’s excellent script from this:
That was fun. Maybe it’s time I actually learned some Python!
Of course, recipes that put things into Twitter are still just fine. Dicks. ↩
Here’s the link from Brett: https://ifttt.com/recipes/43468. I wouldn’t bother going there, though, as it no longer works. As I said, IFTTT was forced by Twitter to remove all recipes that grabbed tweets, so you now get a 404. Like I said, the people that run Twitter are dicks. ↩
If you’re Python-ignorant like I am, you may be wondering about the
'utf8'. That’s not a literal string; that tells the
decodemethod what character encoding to use. I actually realized that from all the HTML I do, where you always have to specify a character encoding for your webpages. ↩