Where did the archived discussions in Google Groups come from?

score:60

Accepted answer

The old discussions were messages on USENET, which still exists. rec.games.chess is a newsgroup in USENET it is not firstly a part of google groups. Messages can be sent and read in the newsgroup without going near Google.

The data of the newsgroups are held on USENET servers of which there are many. USENET is distributed. Messages get copied to each server using NNTP

Each server chooses which groups to load from other servers it is configured to read from.

To read the messages a user has to have a NNTP client or Usenet reader of which there are several, Google Groups' web interface is one of these. You can also run your own server - which I did as I needed messages from more than one place and also needed a quick download on a dialup line instead of reading messages and download each one to read.

Google happens to run some of these servers and provides a front end client to read the messages. The servers got the history from Deja News which had looked and got messages from those servers that had a longer history. Normally a server is only asked for messages that have arrived since the last time that it was asked for them. From Wikipedia it appears

Web-based archiving of Usenet posts began in 1995 at Deja News with a very large, searchable database. In 2001, this database was acquired by Google.[90]

Google Groups hosts an archive of Usenet posts dating back to May 1981. The earliest posts, which date from May 1981 to June 1991, were donated to Google by the University of Western Ontario with the help of David Wiseman and others,[91] and were originally archived by Henry Spencer at the University of Toronto's Zoology department.[92] The archives for late 1991 through early 1995 were provided by Kent Landfield from the NetNews CD series[93] and Jürgen Christoffel from GMD.[94] The archive of posts from March 1995 onward was started by the company DejaNews (later Deja), which was purchased by Google in February 2001. Google began archiving Usenet posts for itself starting in the second week of August 2000.

Usenet servers tended to be in each University and many companies and when dialup internet providers started each of them often had a server, one example was AOL. As the web took over many of these were shutdown and servers outsourced to various companies. Another reason for this is that this was where Spam was first named Wikipedia history of spam and eventually I think produced much of the messages, also binary files e.g. p**n, copies of music, films etc where uploaded making the volume of legal useful information too low for most of the server owners. to pay the costs of having to deal with Usenet.

There are now several Usenet hosting companies, I think, mainly catering for the binary files. There are also a few servers that provide a feed of the text only groups

The advantage of USENET clients is that they provide a fully threaded interface rather than the limits of web fora and also you only need to connect to one place rather than each web site.

Upvote:2

In addition to the relatively small size of news items, and the relatively (compared to today) high quality of postings (which seriously declined with the advent of Eternal September), the one thing I miss most about usenet news groups was the ease and convenience when reading them.

With a single text config file listing groups in my order of preference, a single "trn" command would:

  • Display a list of the top groups and the number of items I haven't seen yet.
  • Hitting the space-bar would display a list of the top unread items in the first group.
  • Hitting the space-bar would display the first item.
  • And so on until the list was exhausted (i.e. it got into the more entertaining or silly groups), or one typed "q".

And not only did I see only those items that I hadn't seen before, everything was presented in chronological order, from oldest to newest order (what a concept!).

And not only that, items that quoted previous postings did it using top-quoting, which not only allowed one to read everything in the order it was written in, it forced the poster to delete all the irrelevant parts of the item they were quoting.

Unless one wanted to reply, everything could be seen, in order, by doing nothing other than hitting the space bar. And by "everything" I mean everything; one didn't have to jump around between FaceBook, and Instagram, and Twitter, and … . And no advertising.

And that includes very local newsgroups, such as for each university's CS123 course, or each company's individual departments. Businesses could conduct what would now be considered as group meetings asynchronously, with everyone at their own desk, responding when necessary and reading the whole thing perhaps hours later when it wouldn't interfere with their real work.

Sigh!

Dear Emily Postnews contains a very ironic set of rules for posting news items, and is well worth reading.

Upvote:25

A point that the other answers so far perhaps don't make clear enough is the decentralised nature of the Internet back then.

We've got used to the idea that every type of content is accessible at some central location (whether that's a single server, a whole data centre, or even a group of data centres all accessible at single address or web page).  But that's not how the Internet started, nor even how it was back in the '90s when Usenet was still hugely popular.

In fact, the Internet developed from military systems which were specifically designed not to rely on big central servers with single points of failure.  And some of that mind-set persisted.

Usenet messages didn't live on a single server anywhere, because there weren't any organisations you could trust to run such a server reliably over the long term — certainly not without charging its users for the privilege.  And even if there were, you couldn't rely on being able to access them reliably at any time.  And even if you could, that access might be slow, and/or cost.

So Usenet developed to be decentralised: messages were sent on from machine to machine, in batches, ending up on your nearest server from which you could collect them in one go — and then read them off-line at your leisure.

That's a very efficient approach: the transfers are done across direct links, perhaps when they're less busy (or cheaper); messages only need to be transferred once regardless of how many users will end up reading them; and reading can be done off-line without incurring any further costs.  (Yes, Internet access could be expensive back then.  I remember using dial-up access — which incurred per-minute charges even though it was a local number in the UK — and having to watch the time carefully and transfer as much as possible in one go so that I could then use it after disconnecting.)

With the advent of unmetered, always-on Internet connections (that didn't tie up your only telephone line!), rapid transfers, highly-reliable data centres, and business models that provide all sorts of services without any direct charges (discounting the indirect costs associated with advertising, usage of your data, malware, censorship…), things have undergone an enormous shift towards centralised services.  And there are of course advantages to that (as well as some disadvantages).

You can see a similar shift with another non-web application on the Internet: email.  Back in the day, email would be sent via a series of email servers (some belonging to big organisations, others less so), but it would end up on your machine, whether that was an account on a system belonging to your employer or university, or your own microcomputer/PC.  That's where the mail would live; the only place it would be accessible.  Some people still use mail clients which work that way, but most now use web mail instead, with the mail living on big servers belonging to your ISP, employer, educational institution, or some unrelated organisation — accessible from anywhere with a net-connected web browser, but no longer under your control.

Even the early web was much less centralised; most web sites were small, and finding them was hard enough that they organised themselves into web-rings and link pages.


So, to answer the question: Usenet messages physically lived on all the news servers carrying the relevant newsgroup, along with all the clients to which people had downloaded them.  Those servers were mostly at universities, Bell Labs sites, Unix-related companies, and ISPs.

More post

Search Posts

Related post