The community’s loudest voices: top 50 most active posters and commenters

Enough technical background…let’s get to some actual data!

Here’s a ranking of the top 50 profiles that I scraped, sorted by post/activity count. (Remember, these totals only include posts that were publicly viewable.

RankNamePosts
1Andy Wells46901
2Erika K33440
3Truth Monger.20761
4Ryan Gallagher [AKA Sasha G]14705
5Dubhghlas MacDubhghlas14416
6rictus grin12111
7Joe Carter11425
8Greensoul Sufi10513
9Stan Ulch8590
10Jade West8522
11unspeakable is lallie laloo8472
12Kizzume Fowler (Kizz)7475
13Dee Church7429
14Bunto Skiffler6221
15Cherie A.5512
16Deep Ashtray5240
17Rodney Hamilton (Konscious)5028
18Gluteus Illuminatus4795
19Wayne GoldPig4719
20R.C. Apologist [AKA Christian Anarchist]4565
21Ernie Deaver4546
22Phosphorus Thoth4357
23Andreas Geisler3941
24Christina BlackFeather3693
25Vlad Tepes3599
26Rene Nene3429
27JD Kain3423
28Jeffrey Ramsey (J.R.)3405
29DynaCatlovesme3333
30Blue [AKA blueasblueis]3227
31Wendy Canuck3094
32Mark Eunoia3050
33Jason Burns3048
34Big Gay Al2648
35Steve McRae2621
36John Klein2595
37The Boom2545
38dylexia Jones12406
39John Broadhurst2357
40Robert Wallace2272
41The Holy Spackle2184
42BillusBallus von MegaDog2155
43debbie duran2150
44Dragnauct Sylvas2033
45Steve Shives2010
46Jungle Jargon2008
47Barry Partington2007
48David Eriol Hickman1956
49The Archive Channel [Jason Burns Archive Channel]1905
50Tony Hooper1853

And here’s a list of the top 50 profiles by comment count. These numbers only includes comments made on threads started by one of the 598 profiles that I scraped, so it obviously doesn’t include all of their activity across Google+, but it’s a big enough sample size that I think the relative rankings should still hold.

RankNameComments
1Truth Monger.58363
2Steve McRae15889
3Andreas Geisler15041
4Dubhghlas MacDubhghlas13080
5Deep Ashtray12643
6Erika K11765
7Jade West10949
8Rodney Mulraney10606
9JD Kain10188
10Fiona Robertson9898
11tommy hall9865
12David Eriol Hickman9577
13Andy Wells8637
14Mark Eunoia7550
15R.C. Apologist [AKA Christian Anarchist]7350
16MrAudienceMember7333
17Big Gay Al6942
18The Holy Spackle6356
19rictus grin6258
20HogTie Champ6130
21Wayne GoldPig5914
22debbie duran5638
23Blue [AKA blueasblueis]5621
24bena berry5106
25Rene Nene5010
26Bert Poole4957
27Scott Guertin4938
28DynaCatlovesme4843
29Jungle Jargon4776
30Gluteus Illuminatus4572
31Ration alMind4509
32dylexia Jones14129
33Ozpin Chibi Cane4070
34BennyOcean4070
35Jim Riven4047
36Tyler Durden3949
37AsDeadAsDillinger3848
38The Raw Atheist3781
39Michael Schofield3755
40Joe Carter3735
41Charles Huckelbery3690
42Moe the Pagan3637
43Christina BlackFeather3544
44hannah anderson3521
45gnarwarrior3478
46Socka Count3420
47Kizzume Fowler (Kizz)3340
48TheNatSecWonk [AKA MMArtist141]3264
49Bunto Skiffler3235
50Eric Peters3175

(I’d insert a funny quip here about Master Debater Steve McRae having the second loudest mouth in the community, but let’s be honest…those jokes practically write themselves.)

What information does the database contain?

The Google+ API allows (well, allowed) access to three types of objects: People, Activities, and Comments. (Click the links to see the API documentation for what each object contains.) Plustractor’s database had one table for each of these objects, and columns for most of the top-level properties returned by the API (plus a few additional columns for useful data that was buried in JSON blobs, so as to make it easier to access). The list of these columns can be found here, though a slightly different schema will be used in analysis: I wrote a script to clean up and reorganize the original database, renamed a few columns, deleted ones that turned out to be empty (thanks Google!), and imported everything into MySQL. The schema for that database (with full column descriptions) will be put up here once I get a chance to write it, as that’s the format in which I’ll eventually be releasing the dataset in the future.

So, here are the specifics on what was actually scraped:

  • People records for each of the profiles in this spreadsheet
  • People records for a few additional profiles that I had on my list but that didn’t have any posts (mainly alts/sock accounts that I had circled)
  • Activity records for every public post or share created by any of those profiles (meaning shared with Public, in a public collection, or to a publicly-accessible group)
  • Comment records for every comment under those activities
  • People records for every author of one of those comments

In case that wasn’t clear, every single bit of profile data, post, and comment in my database was entirely public at the time of scraping. I intended to implement the ability to scrape profiles using an OAuth token to get posts that were only shared with circles, etc, but there just wasn’t enough time (again, THANKS GOOGLE).

Another thing that was not scraped in addition to non-public posts were any pictures, video, or other media uploaded to Google+ by the user. The API simply didn’t provide any way to get this content, and even if it did, the database would have been way too large anyway if I did include it. One exception, however, was profile and cover photos: these are stored on Google’s servers as their own separate files, and their URLs were included in the API responses. I programmed my tool to download these files for each of the profiles on my list (but only for them–I was originally going to download profile and cover photos for every profile I touched including commenters, but this ended up taking way too much space)…unfortunately after the project was complete, I found that the number of images that had been downloaded was far less than it should have been. All of the cover photos are gone now (as they were tied directly to Google+), but profile pictures seem to remain, so I’m going to make another go at grabbing as many of those as I can store.

The Great Debate G+ Archive: Background

Since 2014, I’ve been actively participating in a quirky little internet community centered around discussion of religious and philosophical topics, which internally refers to itself as “The Great Debate Community” (henceforth known as the “GDC”). I’ve made a number of good friends through this community–including my partner, Jakki–and have also met quite a few of them in real life, so the community has come to mean a lot to me over the years. Historically, the GDC has been primarily centered around three platforms: YouTube, Google+, and Google Hangouts. When it was announced in October of 2018 that Google+ would be shutting down the following year, I began working on a way to archive G+ profiles and comment threads, so as to preserve as much community history as possible which would otherwise be lost once the site shut down for good.

The end result of that project was Plustractor, which I began work on in November 2018 and had working to the point of being usable by early February. About halfway through the development process, Google moved up the date for the Google+ shutdown from August to April, with the APIs (which Plustractor relied upon) shutting down on March 7th, so it really ended up being a race against the clock to get it finished in time. The basic function of Plustractor was to use the Google+ API to scrape data about three kinds of G+ objects: people (user profiles), activities (posts and shares), and comments, and save them to a local SQLite database. I could use it to grab anything from the basic profile data of a single user up to every single post and comment under an entire list of profiles.

I constructed a list of 600 or so profiles of interest, mainly based off my and a few other people’s circles, and the GDC wiki (which is no longer available as it was deleted for ToS violations…hmm, I wonder how that happened). I sorted these by priority, starting with major figures in the community, those who were most active on G+, those who ran regular hangouts, and community-relevant YouTubers with big channels who were frequent targets for discussion or drama…then going all the way down to lurkers and minor figures who were no longer active. Since the GDC is a fairly loose social network with no strict definition, I tried to cast as wide of a net as possible. A few people did slip through the cracks, much to my frustration–indeed, I was scraping profiles right up until the very last day the APIs stopped working–but all in all I’m pretty happy with what I got.

The final stats for the archive are as follows: 598 scraped profiles (those that were on my list and had at least one post on G+), 52044 people objects, 463482 activity objects, and 1101232 comment objects, all contained in a database that weighed in at 4.1 GB uncompressed. The list of scraped profiles can be viewed here in this Google Sheet.

Now that Google+ has come to an end, I will be proceeding to analyze this dataset to the best of my abilities, and will my using this blog to document my findings and post any interesting data visualizations that I make along the way. I really think this project could be a goldmine of information about the community and its members, in addition to being a great learning experience for myself, as I want to pursue a career in data science.

Further down the road, I want to design some kind of web interface that will allow people to view and search through the archive data, in a format similar to Google+ (but better and more compact, because let’s face it, Google+ was a bit of a trainwreck interface-wise). At this time I will not be releasing the full database to the public, but if there’s something you’re looking for that was in a thread posted by one of the people on the list, let me know. If there’s a particular query you’d like me to run or you have suggestions for avenues of research, PLEASE do get in contact with me as I’d love to talk with you. I have a number of ideas already, but I’m a novice when it comes to analyzing social networks…suggestions from someone with expertise in this area would be awesome.

Thanks for reading…much more fun stuff to come. The best ways to get in contact with me are as follows:

Email: kdbuchik [a t] gmail dot com
Skype: kevbuc13
Discord: Kevin Bee#3332