Sunday, November 20, 2011

BlogForever Interview

This is an interview of me conducted by Mike Joy for the the BlogForever Consortium, a project co-funded by the European Commission within the Seventh Framework Programme. An audio version of this interview is available here.

First, we would like to understand your background. This will help us to understand the context of your answers.

1.           Could you please tell us a bit about your blogging experiences and why are you blogging?
I have been blogging since before there was blogging. I first got on to the Internet in ’92-’93. My first online experiences were as early as 1981 working for Texas Instruments, but that wasn’t the Internet properly so called. There is this Texas Instruments’ internal World Wide Network. But I first got on to the Internet, really, when I started with, Athabasca University in 1987. But I really started using my Internet access from them, as I said in about ‘92-’93, which should be participating in multiple user online games (MUDs). So, of course, I would have started writing then. The first actual writing for the Web that I began doing would have been in 1995. I created my home page in 1995 and began writing and posting articles almost immediately. I have some of my earliest articles - which are still on my page now - are from ’95, including, for example, a transcript from an online conference that we hosted.  I have been writing articles since then.
What motivated me to begin using what we were today call a blogging format is my participation in online discussion lists (usually hosted lists, although sometimes mailing lists as well). When I was using those, in mid to late nineties, it occurred to me that the archives of these might not last forever. And I should store my own contributions, because they were so brilliant (jokingly), so they would not be lost to history. And that’s exactly what I did. If you look at my list of articles, I have a page with all of my articles on my website. I forget how many there are, I have to look it up. But there are hundreds and hundreds of articles. I am just looking it up now, because I am not sure what it says. It says: 1,159 articles. That’s a bit out of date, so there are probably about 1,200 articles. But at the head of these articles you see: “Posted to www-dev” or “Posted to the HotWired mailing list”.
It turns out that these concerns were prescient, because you can no longer access the www-dev mailing list archives. HotWired was taken down in 1998 and all the postings completely wiped out. It turns out it was a really smart idea for me to keep my own content on my own website and so that is what I have been doing. I mean there’ve always been these posts I have rigged to myself. Concurrent with that, beginning probably in ‘98, but officially on May 5th 2001, is my daily news letter.       
My daily newsletter consists of short posts (not the longer ones that compose my articles) and these short posts are generally links to external resources. This has its origin as a mechanism for me to synchronise my bookmarks between home and office. Back in the 90’s of course, if you wanted to save a website you would bookmark it. Bookmarks were saved locally as HTML files, but  it was very inconvenient to have one set of bookmarks that is at work and then other one at home. So, I set up a little database on my website and instead of bookmarking using the Netscape bookmarking system I would just fill in a form and that would create a bookmark on my website. These were all dated, these were all posted on my website and so that became my daily newsletter.
So, I have two forms of blogging – the articles and the newsletter. The articles come as a means of me saving my posts to mailing lists and such. The newsletter is a means of me saving my bookmarks.

2.           How do you facilitate or prevent that your blog will be found by other people
Good question. When I first started, when I posted something in my newsletter like posting links, and when I’d post a link of so and so I would then send an email to so and so and say “I posted a link to your article in my newsletter”. I don’t do that so much now because the people I would send an email to are generally reading the newsletter anyway and they already know.
My URL is in my email signature, and has always been in my email signature. It used to be on my business card, but we have a new policy at NRC that only the Institute website goes onto business card. So it is no longer on the business card.
Other than that, I post short Twitter posts to indicate when I have written a longer article. That is typically a lengthier article – on the Blogger blog, half an hour. These blogger blog articles are eventually migrated into my website. I also have a system set up so that when I create a post in my newsletter it automatically feeds to a separate Twitter account called @oldaily. My main Twitter account is @Downes, my separate Twitter account is @oldaily, and this one is fed automatically from my newsletter.
I’ve been manually posting posts recently in Google+. I don’t expect to do that forever. I expect to automate that at some point because it takes a few extra minutes every day. What else do I do? I think, probably, the biggest thing; I am a prolific commenter on other people’s websites. I do a lot of reading. I read hundreds of posts a day, possibly thousands of posts a day. It is a ridiculous number. And very frequently, I leave comments on those other posts. Even if I am not going to write about them in my own newsletter I’ll nstart babbling. A lot of the times these comments I leave on other posts become articles on my own site, because once it gets beyond one or two paragraphs that same reasoning clicks in: “Oh, I better keep a copy of this, because I can’t be guaranteed that it will be forever available on the other side”. And, so I snag it and make it an article on my own side.
When I make comment on the other websites, much of the time (almost all of the times these days) they ask for an email address, name and website URL, which means my comment appears on the other website and when they click on my name, they go to my website. I know that drives a lot of traffic.
  Do you use specific keywords or tags?
No, not really. 

3.           Who has the right to do what with your blog content or any data from blog? How do you indicate and control the rights for your content?
I do not control the rights. Life is too short to be controlling things. I use a Creative Commons, by-non-commercial share-like. It is supposed to be 3.0, but I don’t know what version it actually is – I don’t care. My interest in controlling what people do with my staff is, honestly, minimal. I put the licence up only because it makes it easier for other people to use my content, so they don’t sit there and wonder whether they are allowed to do stuff with it or not. I don’t like putting them through that kind of angst. But this is just something that doesn’t matter to me.

4.           Are you interested in possible interconnections between your blog and others?
Well, there are interconnections – they are called links. And every one of my newsletter posts links to something else. Every single one! And most of my blog posts link to something else. As for forming blogging networks or some sort of group-like behaviour, I am not interested in that.

5.           In a platform where you browse and search for blogs and the relations between them what would make the user interface comfortable and intuitive?
It depends on what you are trying to do. I mean, you probably want some sort of a visual representation. There are lists and lists and lists of blogs or forums that have you searched for blogs aren’t really very helpful. Because, in the beginning I use Google. Google has a great search. Any services are very unlikely to match Google search. The only advantage that a local site can provide with respect to search is to limit the range of search results.
On my own website I harvest content from other blogs. I have about (I forget) a thousand to 13 hundred or so blogs that I aggregate and put that content into my database. And search on that is useful because it is limited to the content of these 13 hundred blogs. So, if you were doing something like that, search limited to this content might be useful, but it would have to be a well curated set of blogs. You don’t want just 13 hundred random blogs. You don’t want: “anyone can submit a blog”, because you get ten useful blogs and 25 spam blogs, which would be a big problem.
When you are talking about the inter-connections between the blogs, it is hard to describe how visually it should be represented, although, I have my own ideas on that, but it should be represented visually as a function of contribution and a function of time. So, I think there should be a time axis. Typical representations of the linkages between blogs are never indexed to time. It is always this network and a network guy who says: “here’s a blog, here’s blog, and there is a line between them”. But in real world, the relationships between blogs aren’t like that. They are not static. It is not a one-time thing such as: “here is a network – forever and a day”. It is very fluid, very dynamic. So, having a way of representing that would be important.
And so, the idea is that there would be this image that would change over time and you would see these changes over time as you came back to it on a regular basis or as you subscribed to it. How do you do that, can be a long, long discussion.
What else? You are asking what would make it easy for users. My experience is that, with some few exceptions (these exceptions are sites like Facebook, Google, may be Yahoo, perhaps Flickr, YouTube and a few others) people don’t go to webpages. It is very rare, you hardly ever see it. The only time you really see people going to webpages is if the address of that webpage is returned as a result of a search. As when you do a Google search, get a link, click a link and you are on that page. A person doesn’t just go back to a web page without some sort of a search or other prompt. So, to make any sort of system like this useful to users, it is going to have to provide that prompt that will send users to whatever information is that you want to receive on a regular basis.
That is why like in the Massive Open Online Courses we have, were we do have this network of blogs that are written by individuals all over the place and at the current Connectivism course we have something like 270 blogs (I am just looking it up, because I like to use exact numbers). We have 260 separate blogs that people write. And it is essential for us to have a central newsletter that we send out to every participant every week day, so when somebody has posted a new post at one of those 260 blogs it shows up in the newsletter. Because it is impossible for somebody to go to those 260 blogs and even they are not going to come to our course website to see what’s there. Even though they know there will be something new every day they won’t come to the website, it will not occur to them. So having this prompt is crucial, absolutely crucial. The course couldn’t have run without it. And I think that will be the same for your service.

6.           Are you interested in how your blog is ranked among blogs for the different subjects and how do you check that?
I just assume it is first (jokingly). The short answer is no. I suppose, if somebody came up with a ranking and I was like 81st I would kind of wonder. But you know, does my blog rank higher or lower than George Siemens’s, that really is a pretty irrelevant question. Even the question of ranking itself raises the question of what would constitute the ranking. Is it number of visits? Is it amount of time spent on the blog? Is it the number of posts out there in the world that are spawned off the post on my blog? Is it the number of links from other blogs back to my blog? You can come up with a bunch of other measures. You can look at ranking services such as Alexa or Klout and come up with other ideas on ranking, and they all turn out, in the end, to be kind of arbitrary, and kind of snapshot-ish and kind of quantity-focused.
What really interests me about my blog (with respect to inter-relations to other people) is whether it is first to come out with a concept or an idea, and I have no idea if you guys can rank that or if you can rank that automatically. It interests me if I authored a unique (well, no, not even unique, it doesn’t need to be unique), an informed, an insightful perspective or a point of view that matters. I would prefer to be right more often than other blogs. To me, the number one ranking would be: “I have more factually true statements in my blog than any other blog”, demonstrably and knowably so. But who is going to rank then based on that? And you do not want to rank that trivially, because someone will just start posting dictionary articles, and they get lots of factually true statements. So, say, the highest number of factually true statements that are contextually relevant to the current debate. If you measure that, than that would matter to me, but ranking on the number of readers… I am never going to have the most number of readers, never ever, ever, and anything that tends me to want to have to the most readers is actually detrimental to me, because it means that I am going to be broadening my coverage in order to attract a wider range of interests and that I am going to be making my coverage less deep, or minimally, less idiosyncratic, again to capture broader demographic. I don’t want to do either of those things. Either of those things would damage the integrity of the blog.

7.           By what other criteria would you like to see your blog ranked?
Oh, I see what you mean. You know what, it is not a competition. It has never been a competition. I am not racing against other blogs. I am working with them. Since I started, especially in the newsletter, but also in the blog, I have tried to direct traffic away from my site to other people. This whole idea of ranking blogs creates a competition where there isn’t one. I mean, would you rank the rivets on an airplane? That would be stupid right? What is the number one rivets on your argh…? It is a dumb thing to ask for. And, in the same way, ranking the blogs in the Blogosphere is like ranking the rivets on an airplane. What matters is that they each hold their own part of the airplane together. That is all that matters. And to suggest that one of them is more important than the other makes no sense. It is an incoherent concept and you should not do it.

8.           Would you be interested in an analysis of your blog (or part of your blog) to extract for example: statistics (popularity, visits, etc), keywords or sentiments and why?       
Well, I don’t know how you are going to do visits. Good luck with that. You are not going to get accurate data, because I know that traffic to my blog comes from a wide variety of sources. I get stuff from RSS feeds, I get audio listens, people look at my content on other sites like Flicker or YouTube or Blogger. My content isn’t even located on a single website. If I am trying to increase the ranking of my blog I make sure everything goes on one site. If I am trying to increase the usability, I put different staff on different places. So, you will not be able to get accurate statistics of the readership on my blog - period. I don’t have accurate statistics. I mostly don’t care. In theory I could have sort of accurate statistics, but … I actually have a weblog analyser that I started up over the summer after ten years of running the thing I have finally turned on the hit counter. But even that does not record the number of views on RSS readers and the like. So, that part of it I am not too interested in.
The semantic analysis is interesting, because I am always interested in how people see what I am doing – that’s interesting. If it is just the identification of keywords, I have done that: you submit your blog to Wordle or whatever and get the word-cloud. I would something that is a little bit more insightful. I know that there are text analysis software packages available that you run it through, EPSS or whatever. So, that would be kind of interesting. Comparing the focus, you know, Stephen talks a lot about cognitive structures than George who talks mostly about social structures, that would be kind of interesting. I think that would be really hard to do though. But because it is hard to do, that is probably why it would be interesting.   

9.           Do you archive or back up your blog(s)?
Yes, I do. Everything.
Can you describe the process of archiving or backing up the blog(s) you are authoring.
Quickly? No. Again, my content is scattered all over the place and there are lots of good reasons for that. So, the simple rule of thumb is: I try to make sure that there’s at least two places where any given instance of my content is. The longer version is, I actually try to make sure there is three or four places where everything is. Different content is archived in different ways. So, the article content, any textual content from third party sites like Blogger or comments or posts is retrieved and stored on my main site which is, in my database. And then I periodically do a backup of my database. I also have a backup system through my website provider. I also save copies of my website. They’s just the  whole lock, stock and barrel on hard drives, like I just copy the whole website over to my hard drive at home. I then save that onto some backup hard drives. 
Photos, I make sure to send a copy of my photos to Flickr, and photos are backed up on at least tow, and usually more than two, hard drives. And I have a bunch of photos also saved on DVDs.

Can you describe the process of accessing or restoring information from your archive.  
For other people to access my “archive”, they are accessing the database off my website and they just use the interface of my website. For me to access that, it means getting the database file and re-loading it into the database and then I access it as though it were the first bit.
If it is the images, it is just I open up the hard drive, but mostly, from the perspective of people you look at the images on Flickr. If Flickr ever disappeared then the way they would access my archive is they would wait until I found Picasa or some other image upload site and filed my images there. And if that weren’t available, they would get them off my website.

Can you identify any problems/issues with the procedures you are currently following?
They are not all automatic. I would like to just be able to make content and upload it and not worry about whether there is a backup and just have a backup that would run automatically, and I wouldn’t have to do anything.
The Wayback Machine was really good for that, and it saved me a bunch of times. And what I really liked about it is – I didn’t need to do anything. The problem with the Wayback Machine is that it wasn’t complete. It would capture snapshots, but on a dynamic site like mine snapshots are hit and miss.

10.        Would you like to have a real time, continuous and viewable archive of your blog? Can you imagine what this service could be like?
Well again, good luck with that. I mean, for 99% of people that is going to work fine, for me it is a bit dicier. Would I be interested? Yes, sure. I think that would be a cool thing to have. And again, anything that makes it easier to use my stuff for other people, that’s cool. So, I am all over that. It is a bit tricky, because what you consider worth archiving and what I consider worth archiving might be two separate things. Again, because I have a wide range of different content, in different places. If you are archiving all them separately that’s, kind of, not that efficient. So, you’d probably want to bring them together.  So, I have various blog websites. The one I use most of all is my main website,, as well as the ‘Half an Hour’ website. But I also have another Blogger website called ‘Let's Make Some Art, Dammit’ and then the photo-site, the YouTube site etc. If all of those were pulled into a single archive that would be great. I think it will be complicated to do it all separately. Yes, I think people would like it. I think it would give me a little bit of peace of mind knowing that if I have a complete crash and burn moment it is there – somewhere else.
Everybody sometime in their life is going to have a complete crash and burn moment, right? They would no longer be paying for anything, and all other sites will stop being updated. So, having that would really help.
It would be interesting to know. You say archiving, you mean archiving for ever? That would be cool, especially if you’d archived my backlog (my back catalogue of stuff going back to 1995) and kept it forever. I think that would be really useful.

11.        If there would be a preservation or archiving system for blogs how would you like to control which of your content is captured and stored?
If it’s content, capture and store it. Life is too short to be choosing which stuff to archive and which stuff not to archive. If I create content I think it is work capturing it and storing. But that is just me. I think all my tweets should be captured. I think random off hand remarks should be captured. I have got some like 300 comments through the Disqus commenting system; I think that should be captured.
I think whatever I have created is worth capturing and I think it all forms a part of the overall picture or tapestry that is my contribution to the World, whatever that is. And I bet you most people feel that way. Even my Facebook posts. If you could get into Facebook and pull the content out that would be cool.
I think you will have an issue with duplicates, but, I mean, I do not want to be the one editing duplicates. My content propagates automatically. So, I do not think I want three of the same announcement, just because they showed up on different systems: the announcement first on my, then on Twitter, and then on Facebook, because I have them daisy-chained. Just one will do. That is another long-winded way of saying that I don’t really want to manage that.

How would you like to indicate that content should be removed from the preservation system?
There are two aspects to this. First, locating the content you don’t want to be in the system. So, there needs to be some kind of content-location system, like a search engine of some sort. And then, secondly, the function that actually does the removing, whereas that function has a built-in safety check like: “Delete? Delete forever? Really? Are you sure? This will delete it forever!”. And then, when I do that, don’t actually delete it, but just change its status to “show to nobody”. Because, I will make a mistake even though it said “are you sure, are you sure”… I will at some point make a mistake. So you shouldn’t actually delete it, you should just make it not viewable, unless the decision to delete has been overturned.

12.        How do you facilitate or prevent technically that your blog will be found and disseminated by search engines?
I don’t. I don’t care.

13.        How do you facilitate the readers of your blog that they find related posts inside your blog?
I have tried different things over the years and right now it is a bit haphazard. I have an automatic tagging system (but it’s broken; I keep intending to fix it) I know it is worth mentioning. Basically, I have a predefined list of topics. I define each topic as a regular expression string. I apply that regular expression string to any post in my blog. If there is a match that topic is attached to that post. And then there is a list of topics, so when you click on a particular topic in that list of topics you get the lists of posts that match to that topic. I also capture author/publisher information and that does work. It is not all hypothetical. But the topics is a horrible, horrible nightmare to manage. If you try to build something like that you are going to need so much caching it is not funny.
So, when I submit stuff, I submit the name of the author and the name of the journal or blog it was published in. People can click on the name, that name, anytime and get a list of posts associated with that. I have got a graphing system that I am just building now, but it is intending to track all the links from one post to another post to another post. This is not implemented yet, but the idea should be that you can follow links on links.
All my newsletters, all my contents are searchable. I also have an archive of newsletters that is Google indexable and that really helps a lot of people, because a lot of people find related stuff just by searching on Google.

14.        If there would be a preservation or archiving system for blogs and if there would be a special access or interface for blog authors how should it work?
Invisibly. Yes, I get what you are after. I have thought of this. First thing you have to do is to be able to associate blog authors with blogs properly. That is the thing that Technorati ran into years and years ago and they came up with a “Claim Your Blog” system. There needs to be a mechanism by which you claim your blog or blogs (because the relation will be multiple blogs to multiple people – a many-to-many relationship).
Secondly, we need to understand what sort of functionality that interface would entail.So far, we have one functionality defined and that is to delete a post from the archive. Hopefully, you are not tying up bloggers to more management than that really. You are probably looking into some kind of a blogger profile and being able to apply this profile and information to the blog to provide better searching capability or some such thing. So, you want a profile editor of some sort, but really, it is “yet another profile editor.” YAPE. The World already has about a billion too many profile editors.  So, it would be nice if such a system would actually support access to my blogging system, whatever that may be, through a mechanism such as OAuth or some such thing. But again, how would you do it for someone like me who just has its standalone website, that is a bit problematic. You will need to allow manual input as well as automatic input of data.  It depends what you want your interface to do: authoring about profile and delete the posts that should not be archived. I do not know what you want to do.

15.        Do you have any general comments on the development of a blog aggregation, preservation, management & dissemination software?
I would really want to see it. If you are building this it is going to be hard. I know, because I have built pretty much that. I used systems like that for our Massive Open Online Courses. I used a system like that called EDU-RSS, that works off and on. You are going to run into issues with specific types of blogs like Posterous and Tumblr. You are thinking of tracking the relationships between blogs, I think that is a very useful thing to do. 
You are going to find these relationships show up in odd ways; to give you one example – images. X uses an image, Y uses the same image, that creates a link between X and Y even though may have never connected with each other. They may use the same images at the same URL, which is easy to detect, they may use the same image where the copies of the images are located at different URLs, even though it is the same image. One may use the cropped version of the same image of another. All of these kinds of things create these kinds of linkages. Definitely go for the easy case. My aggregator, right now, analyses for links for embedded media, for images, anything I can find. And then I create separate entity tables out of these and now I am able to draw links from people to blog posts to images to comments to whatever. If you are going to do that, you probably will, you are probably looking at creating a giant global graph from entities to entities and then creating some interesting linkages out of that. But this is a lot of overhead. It is a lot of processing. It is going to be hard to do. It is going to require a lot of hardware to pull off and bandwidth to pull off. So, there are probably financial issues as well, which leads into questions about sustainability. Keep me informed. I would really like to see how you address all these challenges that I looked at. And feel free to talk to me about any of the challenges you are facing because I may have already faced them, because, I have been, like I said, this deep in this stuff. It is most of what I am doing these days. And it is an area of a very deep interest of mine.

Thank you very much indeed Stephen, That has been really, really helpful. 

No comments:

Post a Comment

Your comments will be moderated. Sorry, but it's not a nice world out there.