The community’s loudest voices: top 50 most active posters and commenters

Posted on May 2, 2019 (Updated May 3, 2019) by Kevin

Enough technical background…let’s get to some actual data!

Here’s a ranking of the top 50 profiles that I scraped, sorted by post/activity count. (Remember, these totals only include posts that were publicly viewable.

Rank	Name	Posts
1	Andy Wells	46901
2	Erika K	33440
3	Truth Monger.	20761
4	Ryan Gallagher [AKA Sasha G]	14705
5	Dubhghlas MacDubhghlas	14416
6	rictus grin	12111
7	Joe Carter	11425
8	Greensoul Sufi	10513
9	Stan Ulch	8590
10	Jade West	8522
11	unspeakable is lallie laloo	8472
12	Kizzume Fowler (Kizz)	7475
13	Dee Church	7429
14	Bunto Skiffler	6221
15	Cherie A.	5512
16	Deep Ashtray	5240
17	Rodney Hamilton (Konscious)	5028
18	Gluteus Illuminatus	4795
19	Wayne GoldPig	4719
20	R.C. Apologist [AKA Christian Anarchist]	4565
21	Ernie Deaver	4546
22	Phosphorus Thoth	4357
23	Andreas Geisler	3941
24	Christina BlackFeather	3693
25	Vlad Tepes	3599
26	Rene Nene	3429
27	JD Kain	3423
28	Jeffrey Ramsey (J.R.)	3405
29	DynaCatlovesme	3333
30	Blue [AKA blueasblueis]	3227
31	Wendy Canuck	3094
32	Mark Eunoia	3050
33	Jason Burns	3048
34	Big Gay Al	2648
35	Steve McRae	2621
36	John Klein	2595
37	The Boom	2545
38	dylexia Jones1	2406
39	John Broadhurst	2357
40	Robert Wallace	2272
41	The Holy Spackle	2184
42	BillusBallus von MegaDog	2155
43	debbie duran	2150
44	Dragnauct Sylvas	2033
45	Steve Shives	2010
46	Jungle Jargon	2008
47	Barry Partington	2007
48	David Eriol Hickman	1956
49	The Archive Channel [Jason Burns Archive Channel]	1905
50	Tony Hooper	1853

And here’s a list of the top 50 profiles by comment count. These numbers only includes comments made on threads started by one of the 598 profiles that I scraped, so it obviously doesn’t include all of their activity across Google+, but it’s a big enough sample size that I think the relative rankings should still hold.

Rank	Name	Comments
1	Truth Monger.	58363
2	Steve McRae	15889
3	Andreas Geisler	15041
4	Dubhghlas MacDubhghlas	13080
5	Deep Ashtray	12643
6	Erika K	11765
7	Jade West	10949
8	Rodney Mulraney	10606
9	JD Kain	10188
10	Fiona Robertson	9898
11	tommy hall	9865
12	David Eriol Hickman	9577
13	Andy Wells	8637
14	Mark Eunoia	7550
15	R.C. Apologist [AKA Christian Anarchist]	7350
16	MrAudienceMember	7333
17	Big Gay Al	6942
18	The Holy Spackle	6356
19	rictus grin	6258
20	HogTie Champ	6130
21	Wayne GoldPig	5914
22	debbie duran	5638
23	Blue [AKA blueasblueis]	5621
24	bena berry	5106
25	Rene Nene	5010
26	Bert Poole	4957
27	Scott Guertin	4938
28	DynaCatlovesme	4843
29	Jungle Jargon	4776
30	Gluteus Illuminatus	4572
31	Ration alMind	4509
32	dylexia Jones1	4129
33	Ozpin Chibi Cane	4070
34	BennyOcean	4070
35	Jim Riven	4047
36	Tyler Durden	3949
37	AsDeadAsDillinger	3848
38	The Raw Atheist	3781
39	Michael Schofield	3755
40	Joe Carter	3735
41	Charles Huckelbery	3690
42	Moe the Pagan	3637
43	Christina BlackFeather	3544
44	hannah anderson	3521
45	gnarwarrior	3478
46	Socka Count	3420
47	Kizzume Fowler (Kizz)	3340
48	TheNatSecWonk [AKA MMArtist141]	3264
49	Bunto Skiffler	3235
50	Eric Peters	3175

(I’d insert a funny quip here about Master Debater Steve McRae having the second loudest mouth in the community, but let’s be honest…those jokes practically write themselves.)

What information does the database contain?

Posted on May 2, 2019 (Updated May 2, 2019) by Kevin

The Google+ API allows (well, allowed) access to three types of objects: People, Activities, and Comments. (Click the links to see the API documentation for what each object contains.) Plustractor’s database had one table for each of these objects, and columns for most of the top-level properties returned by the API (plus a few additional columns for useful data that was buried in JSON blobs, so as to make it easier to access). The list of these columns can be found here, though a slightly different schema will be used in analysis: I wrote a script to clean up and reorganize the original database, renamed a few columns, deleted ones that turned out to be empty (thanks Google!), and imported everything into MySQL. The schema for that database (with full column descriptions) will be put up here once I get a chance to write it, as that’s the format in which I’ll eventually be releasing the dataset in the future.

So, here are the specifics on what was actually scraped:

People records for each of the profiles in this spreadsheet
People records for a few additional profiles that I had on my list but that didn’t have any posts (mainly alts/sock accounts that I had circled)
Activity records for every public post or share created by any of those profiles (meaning shared with Public, in a public collection, or to a publicly-accessible group)
Comment records for every comment under those activities
People records for every author of one of those comments

In case that wasn’t clear, every single bit of profile data, post, and comment in my database was entirely public at the time of scraping. I intended to implement the ability to scrape profiles using an OAuth token to get posts that were only shared with circles, etc, but there just wasn’t enough time (again, THANKS GOOGLE).

Another thing that was not scraped in addition to non-public posts were any pictures, video, or other media uploaded to Google+ by the user. The API simply didn’t provide any way to get this content, and even if it did, the database would have been way too large anyway if I did include it. One exception, however, was profile and cover photos: these are stored on Google’s servers as their own separate files, and their URLs were included in the API responses. I programmed my tool to download these files for each of the profiles on my list (but only for them–I was originally going to download profile and cover photos for every profile I touched including commenters, but this ended up taking way too much space)…unfortunately after the project was complete, I found that the number of images that had been downloaded was far less than it should have been. All of the cover photos are gone now (as they were tied directly to Google+), but profile pictures seem to remain, so I’m going to make another go at grabbing as many of those as I can store.

The Great Debate G+ Archive: Background

Posted on May 2, 2019 (Updated May 2, 2019) by Kevin

Since 2014, I’ve been actively participating in a quirky little internet community centered around discussion of religious and philosophical topics, which internally refers to itself as “The Great Debate Community” (henceforth known as the “GDC”). I’ve made a number of good friends through this community–including my partner, Jakki–and have also met quite a few of them in real life, so the community has come to mean a lot to me over the years. Historically, the GDC has been primarily centered around three platforms: YouTube, Google+, and Google Hangouts. When it was announced in October of 2018 that Google+ would be shutting down the following year, I began working on a way to archive G+ profiles and comment threads, so as to preserve as much community history as possible which would otherwise be lost once the site shut down for good.

The end result of that project was Plustractor, which I began work on in November 2018 and had working to the point of being usable by early February. About halfway through the development process, Google moved up the date for the Google+ shutdown from August to April, with the APIs (which Plustractor relied upon) shutting down on March 7th, so it really ended up being a race against the clock to get it finished in time. The basic function of Plustractor was to use the Google+ API to scrape data about three kinds of G+ objects: people (user profiles), activities (posts and shares), and comments, and save them to a local SQLite database. I could use it to grab anything from the basic profile data of a single user up to every single post and comment under an entire list of profiles.

I constructed a list of 600 or so profiles of interest, mainly based off my and a few other people’s circles, and the GDC wiki (which is no longer available as it was deleted for ToS violations…hmm, I wonder how that happened). I sorted these by priority, starting with major figures in the community, those who were most active on G+, those who ran regular hangouts, and community-relevant YouTubers with big channels who were frequent targets for discussion or drama…then going all the way down to lurkers and minor figures who were no longer active. Since the GDC is a fairly loose social network with no strict definition, I tried to cast as wide of a net as possible. A few people did slip through the cracks, much to my frustration–indeed, I was scraping profiles right up until the very last day the APIs stopped working–but all in all I’m pretty happy with what I got.

The final stats for the archive are as follows: 598 scraped profiles (those that were on my list and had at least one post on G+), 52044 people objects, 463482 activity objects, and 1101232 comment objects, all contained in a database that weighed in at 4.1 GB uncompressed. The list of scraped profiles can be viewed here in this Google Sheet.

Now that Google+ has come to an end, I will be proceeding to analyze this dataset to the best of my abilities, and will my using this blog to document my findings and post any interesting data visualizations that I make along the way. I really think this project could be a goldmine of information about the community and its members, in addition to being a great learning experience for myself, as I want to pursue a career in data science.

Further down the road, I want to design some kind of web interface that will allow people to view and search through the archive data, in a format similar to Google+ (but better and more compact, because let’s face it, Google+ was a bit of a trainwreck interface-wise). At this time I will not be releasing the full database to the public, but if there’s something you’re looking for that was in a thread posted by one of the people on the list, let me know. If there’s a particular query you’d like me to run or you have suggestions for avenues of research, PLEASE do get in contact with me as I’d love to talk with you. I have a number of ideas already, but I’m a novice when it comes to analyzing social networks…suggestions from someone with expertise in this area would be awesome.

Thanks for reading…much more fun stuff to come. The best ways to get in contact with me are as follows:

Email: kdbuchik [a t] gmail dot com
Skype: kevbuc13
Discord: Kevin Bee#3332