{"id":15,"date":"2019-05-02T03:40:53","date_gmt":"2019-05-02T03:40:53","guid":{"rendered":"http:\/\/gdcarchive.kdbuchik.com\/blog\/?p=15"},"modified":"2019-05-02T22:52:33","modified_gmt":"2019-05-02T22:52:33","slug":"the-great-debate-g-archive-background","status":"publish","type":"post","link":"https:\/\/gdcarchive.kdbuchik.com\/blog\/?p=15","title":{"rendered":"The Great Debate G+ Archive: Background"},"content":{"rendered":"\n<p>Since 2014, I&#8217;ve been actively participating in a quirky little internet community centered around discussion of religious and philosophical topics, which internally refers to itself as &#8220;The Great Debate Community&#8221; (henceforth known as the &#8220;GDC&#8221;). I&#8217;ve made a number of good friends through this community&#8211;including my partner, Jakki&#8211;and have also met quite a few of them in real life, so the community has come to mean a lot to me over the years. Historically, the GDC has been primarily centered around three platforms: YouTube, Google+, and Google Hangouts. When it was announced in October of 2018 that Google+ would be shutting down the following year, I began working on a way to archive G+ profiles and comment threads, so as to preserve as much community history as possible which would otherwise be lost once the site shut down for good.<\/p>\n\n\n\n<p>The end result of that project was <a href=\"https:\/\/gitlab.com\/kbuchik\/plustractor\/blob\/master\/plustractor.py\"><strong>Plustractor<\/strong><\/a>, which I began work on in November 2018 and had working to the point of being usable by early February. About halfway through the development process, Google moved up the date for the Google+ shutdown from August to April, with the APIs (which Plustractor relied upon) shutting down on March 7th, so it really ended up being a race against the clock to get it finished in time. The basic function of Plustractor was to use the Google+ API to scrape data about three kinds of G+ objects: people (user profiles), activities (posts and shares), and comments, and save them to a local SQLite database. I could use it to grab anything from the basic profile data of a single user up to every single post and comment under an entire list of profiles.<\/p>\n\n\n\n<p>I constructed a list of 600 or so profiles of interest, mainly based off my and a few other people&#8217;s circles, and the GDC wiki (which is no longer available as it was deleted for ToS violations&#8230;hmm, I wonder how <strong>that<\/strong> happened). I sorted these by priority, starting with major figures in the community, those who were most active on G+, those who ran regular hangouts, and community-relevant YouTubers with big channels who were frequent targets for discussion or drama&#8230;then going all the way down to lurkers and minor figures who were no longer active. Since the GDC is a fairly loose social network with no strict definition, I tried to cast as wide of a net as possible. A few people did slip through the cracks, much to my frustration&#8211;indeed, I was scraping profiles right up until the very last day the APIs stopped working&#8211;but all in all I&#8217;m pretty happy with what I got.<\/p>\n\n\n\n<p>The final stats for the archive are as follows: 598 scraped profiles (those that were on my list and had at least one post on G+), 52044 people objects, 463482 activity objects, and 1101232 comment objects, all contained in a database that weighed in at 4.1 GB uncompressed. The list of scraped profiles can be viewed <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1Fdopq4Uc2qFAOkqQTzyXdKtutP6lMXDQ5LkqHpZmCYs\/edit?usp=sharing\"><strong>here in this Google Sheet<\/strong><\/a>.<\/p>\n\n\n\n<p>Now that Google+ has come to an end, I will be proceeding to analyze this dataset to the best of my abilities, and will my using this blog to document my findings and post any interesting data visualizations that I make along the way. I really think this project could be a goldmine of information about the community and its members, in addition to being a great learning experience for myself, as I want to pursue a career in data science.<\/p>\n\n\n\n<p>Further down the road, I want to design some kind of web interface that will allow people to view and search through the archive data, in a format similar to Google+ (but better and more compact, because let&#8217;s face it, Google+ was a bit of a trainwreck interface-wise). At this time I will not be releasing the full database to the public, but if there&#8217;s something you&#8217;re looking for that was in a thread posted by one of the people on the list, let me know. If there&#8217;s a particular query you&#8217;d like me to run or you have suggestions for avenues of research, PLEASE do get in contact with me as I&#8217;d love to talk with you. I have a number of ideas already, but I&#8217;m a novice when it comes to analyzing social networks&#8230;suggestions from someone with expertise in this area would be awesome.<\/p>\n\n\n\n<p>Thanks for reading&#8230;much more fun stuff to come. The best ways to get in contact with me are as follows:<\/p>\n\n\n\n<p>Email: kdbuchik [a t] gmail dot com<br>Skype: kevbuc13<br>Discord:  Kevin Bee#3332<br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Since 2014, I&#8217;ve been actively participating in a quirky little internet community centered around discussion of religious and philosophical topics, which internally refers to itself as &#8220;The Great Debate Community&#8221; (henceforth known as the &#8220;GDC&#8221;). I&#8217;ve made a number of good friends through this community&#8211;including my partner, Jakki&#8211;and have also met quite a few of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/15"}],"collection":[{"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=15"}],"version-history":[{"count":11,"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/15\/revisions"}],"predecessor-version":[{"id":67,"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/15\/revisions\/67"}],"wp:attachment":[{"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=15"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=15"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gdcarchive.kdbuchik.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=15"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}