Tags » Open Source

Loklak fuels Open Event !

A general background building….

The FOSSASIA Open Event Project aims to make it easier for events, conferences, tech summits to easily create Web and Mobile (only Android currently) micro Apps. 1,641 more words

jigyasa reblogged this on Being Curious .... and commented:

banner-gsoc2016_2

As Google Summer of Code 2016 has officially begun, I am all excited to be working with FOSSASIA yet again. This time, I have been assigned the project Loklak where I shall be spending the summer working on implementing indexing (harvester/scrapers) for different services like weibo.com, angel.co, meetup.com, Instagram etc.

Loklak is a server application which is able to collect messages from various sources, including Twitter. This server contains a search index and a peer-to-peer index sharing interface.

If one likes to be anonymous when searching things, want to archive Tweets or messages about specific topics and if you are looking for a tool to create statistics about Tweet topics, then Loklak is the best option to consider.

loklak_anonymous.png



Loklak fuels Open Event !

  A general background building.... The FOSSASIA Open Event Project aims to make it easier for events, conferences, tech summits to easily create Web and Mobile (only Android currently) micro Apps. The project comprises of a data schema for easily storing event details, a server and web front-end that are used to view, modify, update this data easily by the event organizers, a mobile-friendly web-app client to show the event data to attendees, an Android app template which will be used to generate specific apps for each event. And Eventbrite is the world's largest self-service ticketing platform. It allows anyone to create, share and find events comprising music festivals, marathons, conferences, hackathons, air guitar contests, political rallies, fundraisers, gaming competitions etc.

Kaboom !

Loklak now has a dedicated Eventbrite scraper API which takes in the URL of the event listing on eventbrite.com and outputs JSON Files as required by the Open Event Generator viz: events.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json (details: Open Event Server : API Documentation) What do we differently do than using the Eventbrite API  ? No authentication tokens required. This gels in perfectly with the Loklak missive. To achieve this, I have simply parsed the HTML Pages using my favorite JSoup: The Java HTML parser library because it provides a very convenient API for extracting and manipulating data, scrape and parse all varieties of HTML from a URL. The API call format is as: http://loklak.org/api/eventbritecrawler.json?url=https://www.eventbrite.com/[event-name-and-id] And in return we get all the details on the Eventbrite page as JSONObject and also it gets stored in differently named files in a zipped folder [userHome + "/Downloads/EventBriteInfo"] Example: Event URL: https://www.eventbrite.de/e/global-health-security-focus-africa-tickets-25740798421 Screenshot from 2016-07-04 07:04:38 API Call: http://loklak.org/api/eventbritecrawler.json?url=https://www.eventbrite.de/e/global-health-security-focus-africa-tickets-25740798421 Output: JSON Object on screen andevents.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json files written out in a zipped folder locally. Screenshot from 2016-07-04 07:05:16 Screenshot from 2016-07-04 07:57:00
For reference, the code is as:
/**
 *  Eventbrite.com Crawler v2.0
 *  Copyright 19.06.2016 by Jigyasa Grover, @jig08
 *
 *  This library is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU Lesser General Public
 *  License as published by the Free Software Foundation; either
 *  version 2.1 of the License, or (at your option) any later version.
 *  
 *  This library is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *  Lesser General Public License for more details.
 *  
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program in the file lgpl21.txt
 *  If not, see http://www.gnu.org/licenses/.
 */

package org.loklak.api.search;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.loklak.http.RemoteAccess;
import org.loklak.server.Query;

public class EventbriteCrawler extends HttpServlet {

    private static final long serialVersionUID = 5216519528576842483L;

    @Override
    protected void doPost(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        doGet(request, response);
    }

    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        Query post = RemoteAccess.evaluate(request);

        // manage DoS
        if (post.isDoS_blackout()) {
            response.sendError(503, "your request frequency is too high");
            return;
        }

        String url = post.get("url", "");

        Document htmlPage = null;

        try {
            htmlPage = Jsoup.connect(url).get();
        } catch (Exception e) {
            e.printStackTrace();
        }

        String eventID = null;
        String eventName = null;
        String eventDescription = null;

        // TODO Fetch Event Color
        String eventColor = null;

        String imageLink = null;

        String eventLocation = null;

        String startingTime = null;
        String endingTime = null;

        String ticketURL = null;

        Elements tagSection = null;
        Elements tagSpan = null;
        String[][] tags = new String[5][2];
        String topic = null; // By default

        String closingDateTime = null;
        String schedulePublishedOn = null;
        JSONObject creator = new JSONObject();
        String email = null;

        Float latitude = null;
        Float longitude = null;

        String privacy = "public"; // By Default
        String state = "completed"; // By Default
        String eventType = "";

        eventID = htmlPage.getElementsByTag("body").attr("data-event-id");
        eventName = htmlPage.getElementsByClass("listing-hero-body").text();
        eventDescription = htmlPage.select("div.js-xd-read-more-toggle-view.read-more__toggle-view").text();

        eventColor = null;

        imageLink = htmlPage.getElementsByTag("picture").attr("content");

        eventLocation = htmlPage.select("p.listing-map-card-street-address.text-default").text();
        startingTime = htmlPage.getElementsByAttributeValue("property", "event:start_time").attr("content").substring(0,
                19);
        endingTime = htmlPage.getElementsByAttributeValue("property", "event:end_time").attr("content").substring(0,
                19);

        ticketURL = url + "#tickets";

        // TODO Tags to be modified to fit in the format of Open Event "topic"
        tagSection = htmlPage.getElementsByAttributeValue("data-automation", "ListingsBreadcrumbs");
        tagSpan = tagSection.select("span");
        topic = "";

        int iterator = 0, k = 0;
        for (Element e : tagSpan) {
            if (iterator % 2 == 0) {
                tags[k][1] = "www.eventbrite.com"
                        + e.select("a.js-d-track-link.badge.badge--tag.l-mar-top-2").attr("href");
            } else {
                tags[k][0] = e.text();
                k++;
            }
            iterator++;
        }

        creator.put("email", "");
        creator.put("id", "1"); // By Default

        latitude = Float
                .valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:latitude").attr("content"));
        longitude = Float
                .valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:longitude").attr("content"));

        // TODO This returns: "events.event" which is not supported by Open
        // Event Generator
        // eventType = htmlPage.getElementsByAttributeValue("property",
        // "og:type").attr("content");

        String organizerName = null;
        String organizerLink = null;
        String organizerProfileLink = null;
        String organizerWebsite = null;
        String organizerContactInfo = null;
        String organizerDescription = null;
        String organizerFacebookFeedLink = null;
        String organizerTwitterFeedLink = null;
        String organizerFacebookAccountLink = null;
        String organizerTwitterAccountLink = null;

        organizerName = htmlPage.select("a.js-d-scroll-to.listing-organizer-name.text-default").text().substring(4);
        organizerLink = url + "#listing-organizer";
        organizerProfileLink = htmlPage
                .getElementsByAttributeValue("class", "js-follow js-follow-target follow-me fx--fade-in is-hidden")
                .attr("href");
        organizerContactInfo = url + "#lightbox_contact";

        Document orgProfilePage = null;

        try {
            orgProfilePage = Jsoup.connect(organizerProfileLink).get();
        } catch (Exception e) {
            e.printStackTrace();
        }

        organizerWebsite = orgProfilePage.getElementsByAttributeValue("class", "l-pad-vert-1 organizer-website").text();
        organizerDescription = orgProfilePage.select("div.js-long-text.organizer-description").text();
        organizerFacebookFeedLink = organizerProfileLink + "#facebook_feed";
        organizerTwitterFeedLink = organizerProfileLink + "#twitter_feed";
        organizerFacebookAccountLink = orgProfilePage.getElementsByAttributeValue("class", "fb-page").attr("data-href");
        organizerTwitterAccountLink = orgProfilePage.getElementsByAttributeValue("class", "twitter-timeline")
                .attr("href");

        JSONArray socialLinks = new JSONArray();

        JSONObject fb = new JSONObject();
        fb.put("id", "1");
        fb.put("name", "Facebook");
        fb.put("link", organizerFacebookAccountLink);
        socialLinks.put(fb);

        JSONObject tw = new JSONObject();
        tw.put("id", "2");
        tw.put("name", "Twitter");
        tw.put("link", organizerTwitterAccountLink);
        socialLinks.put(tw);

        JSONArray jsonArray = new JSONArray();

        JSONObject event = new JSONObject();
        event.put("event_url", url);
        event.put("id", eventID);
        event.put("name", eventName);
        event.put("description", eventDescription);
        event.put("color", eventColor);
        event.put("background_url", imageLink);
        event.put("closing_datetime", closingDateTime);
        event.put("creator", creator);
        event.put("email", email);
        event.put("location_name", eventLocation);
        event.put("latitude", latitude);
        event.put("longitude", longitude);
        event.put("start_time", startingTime);
        event.put("end_time", endingTime);
        event.put("logo", imageLink);
        event.put("organizer_description", organizerDescription);
        event.put("organizer_name", organizerName);
        event.put("privacy", privacy);
        event.put("schedule_published_on", schedulePublishedOn);
        event.put("state", state);
        event.put("type", eventType);
        event.put("ticket_url", ticketURL);
        event.put("social_links", socialLinks);
        event.put("topic", topic);
        jsonArray.put(event);

        JSONObject org = new JSONObject();
        org.put("organizer_name", organizerName);
        org.put("organizer_link", organizerLink);
        org.put("organizer_profile_link", organizerProfileLink);
        org.put("organizer_website", organizerWebsite);
        org.put("organizer_contact_info", organizerContactInfo);
        org.put("organizer_description", organizerDescription);
        org.put("organizer_facebook_feed_link", organizerFacebookFeedLink);
        org.put("organizer_twitter_feed_link", organizerTwitterFeedLink);
        org.put("organizer_facebook_account_link", organizerFacebookAccountLink);
        org.put("organizer_twitter_account_link", organizerTwitterAccountLink);
        jsonArray.put(org);

        JSONArray microlocations = new JSONArray();
        jsonArray.put(microlocations);

        JSONArray customForms = new JSONArray();
        jsonArray.put(customForms);

        JSONArray sessionTypes = new JSONArray();
        jsonArray.put(sessionTypes);

        JSONArray sessions = new JSONArray();
        jsonArray.put(sessions);

        JSONArray sponsors = new JSONArray();
        jsonArray.put(sponsors);

        JSONArray speakers = new JSONArray();
        jsonArray.put(speakers);

        JSONArray tracks = new JSONArray();
        jsonArray.put(tracks);

        JSONObject eventBriteResult = new JSONObject();
        eventBriteResult.put("Event Brite Event Details", jsonArray);

        // print JSON
        response.setCharacterEncoding("UTF-8");
        PrintWriter sos = response.getWriter();
        sos.print(eventBriteResult.toString(2));
        sos.println();

        String userHome = System.getProperty("user.home");
        String path = userHome + "/Downloads/EventBriteInfo";

        new File(path).mkdir();

        try (FileWriter file = new FileWriter(path + "/event.json")) {
            file.write(event.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/org.json")) {
            file.write(org.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/social_links.json")) {
            file.write(socialLinks.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/microlocations.json")) {
            file.write(microlocations.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/custom_forms.json")) {
            file.write(customForms.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/session_types.json")) {
            file.write(sessionTypes.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/sessions.json")) {
            file.write(sessions.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/sponsors.json")) {
            file.write(sponsors.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/speakers.json")) {
            file.write(speakers.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try (FileWriter file = new FileWriter(path + "/tracks.json")) {
            file.write(tracks.toString());
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        try {
            zipFolder(path, userHome + "/Downloads");
        } catch (Exception e1) {
            e1.printStackTrace();
        }

    }

    static public void zipFolder(String srcFolder, String destZipFile) throws Exception {
        ZipOutputStream zip = null;
        FileOutputStream fileWriter = null;
        fileWriter = new FileOutputStream(destZipFile);
        zip = new ZipOutputStream(fileWriter);
        addFolderToZip("", srcFolder, zip);
        zip.flush();
        zip.close();
    }

    static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception {
        File folder = new File(srcFile);
        if (folder.isDirectory()) {
            addFolderToZip(path, srcFile, zip);
        } else {
            byte[] buf = new byte[1024];
            int len;
            FileInputStream in = new FileInputStream(srcFile);
            zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
            while ((len = in.read(buf)) > 0) {
                zip.write(buf, 0, len);
            }
            in.close();
        }
    }

    static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception {
        File folder = new File(srcFolder);

        for (String fileName : folder.list()) {
            if (path.equals("")) {
                addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
            } else {
                addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);
            }
        }
    }

}
Check out https://github.com/loklak/loklak_server for more...
  Feel free to ask questions regarding the above code snippet. Also, Stay tuned for the next part of this post which shall include using the scraped information for Open Event. Feedback and Suggestions welcome :)

With Commercial Licensing, Invest in Innovation, not Protection

When people start creating commercially licensed software (like we did, in 2013 with jOOQ), there is always the big looming question:

What do I do about piracy?

1,222 more words
Jooq

Fedora 24 - a Unix Perspective

As a follow-up on my previous entry, I decided to spend some time with Fedora Linux. Release 24 just got out, making me curious what the most recent offering has to offer (pun intended). 548 more words

Linux

Into the OSS Development Fray!

I haven’t written a single entry recently as I was very busy polishing my Python and tcsh scripting skills. Many apologies for that! Meanwhile, I am trying to assemble a simple NAS (Network Attached Storage) from the many bits and pieces I have at home. 459 more words

Linux

My Journey in Open Source

One can raise an argument saying “is the competitions like GSoC and Outreachy , really helpful to open source or do they just create short term contributors who will only contribute for the monetary gain?” Being a constant open source contributor in various projects and organisations I have seen the same. 1,256 more words

Open Source

Now get wordpress blog updates with Loklak !

Loklak shall soon be spoiling its users !

Next, it will be bringing in tiny tweet-like cards showing the blog-posts (title, publishing date, author and content) from the given WordPress Blog URL. 725 more words

jigyasa reblogged this on Being Curious .... and commented:

banner-gsoc2016_2

As Google Summer of Code 2016 has officially begun, I am all excited to be working with FOSSASIA yet again. This time, I have been assigned the project Loklak where I shall be spending the summer working on implementing indexing (harvester/scrapers) for different services like weibo.com, angel.co, meetup.com, Instagram etc.

Loklak is a server application which is able to collect messages from various sources, including Twitter. This server contains a search index and a peer-to-peer index sharing interface.

If one likes to be anonymous when searching things, want to archive Tweets or messages about specific topics and if you are looking for a tool to create statistics about Tweet topics, then Loklak is the best option to consider.

loklak_anonymous.png



NOW GET WORDPRESS BLOG UPDATES WITH LOKLAK !

  Loklak shall soon be spoiling its users ! Next, it will be bringing in tiny tweet-like cards showing the blog-posts (title, publishing date, author and content) from the given Wordpress Blog URL. This feature is certain to expand the realm of Loklak's missive of building a comprehensive and an extensive social network dispensing useful information. Screenshot from 2016-06-22 04:48:28 In order to implement this feature, I have again made the use of JSoup: The Java HTML parser library as it provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a URL. The information is scraped making use of JSoup after the corresponding URL in the format "https://[username].wordpress.com/" is passed as an argument to the function scrapeWordpress(String blogURL){..} which returns a JSONObject as the result. A look at the code snippet :
/**
 *  Wordpress Blog Scraper
 *  By Jigyasa Grover, @jig08
 **/

package org.loklak.harvester;

import java.io.IOException;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class WordpressBlogScraper {
    public static void main(String args[]){
        
        String blogURL = "https://loklaknet.wordpress.com/";
        scrapeWordpress(blogURL);       
    }
    
    public static JSONObject scrapeWordpress(String blogURL) {
        
                Document blogHTML = null;
        
        Elements articles = null;
        Elements articleList_title = null;
        Elements articleList_content = null;
        Elements articleList_dateTime = null;
        Elements articleList_author = null;

        String[][] blogPosts = new String[100][4];
        
        //blogPosts[][0] = Blog Title
        //blogPosts[][1] = Posted On
        //blogPosts[][2] = Author
        //blogPosts[][3] = Blog Content
        
        Integer numberOfBlogs = 0;
        Integer iterator = 0;
        
        try{            
            blogHTML = Jsoup.connect(blogURL).get();
        }catch (IOException e) {
            e.printStackTrace();
        }
            
            articles = blogHTML.getElementsByTag("article");
            
            iterator = 0;
            for(Element article : articles){
                
                articleList_title = article.getElementsByClass("entry-title");              
                for(Element blogs : articleList_title){
                    blogPosts[iterator][0] = blogs.text().toString();
                }
                
                articleList_dateTime = article.getElementsByClass("posted-on");             
                for(Element blogs : articleList_dateTime){
                    blogPosts[iterator][1] = blogs.text().toString();
                }
                
                articleList_author = article.getElementsByClass("byline");              
                for(Element blogs : articleList_author){
                    blogPosts[iterator][2] = blogs.text().toString();
                }
                
                articleList_content = article.getElementsByClass("entry-content");              
                for(Element blogs : articleList_content){
                    blogPosts[iterator][3] = blogs.text().toString();
                }
                
                iterator++;
                
            }
            
            numberOfBlogs = iterator;
            
            JSONArray blog = new JSONArray();
            
            for(int k = 0; k<numberOfBlogs; k++){
                JSONObject blogpost = new JSONObject();
                blogpost.put("blog_url", blogURL);
                blogpost.put("title", blogPosts[k][0]);
                blogpost.put("posted_on", blogPosts[k][1]);
                blogpost.put("author", blogPosts[k][2]);
                blogpost.put("content", blogPosts[k][3]);
                blog.put(blogpost);
            }           
            
            JSONObject final_blog_info = new JSONObject();
            
            final_blog_info.put("Wordpress blog: " + blogURL, blog);            

            System.out.println(final_blog_info);
            
            return final_blog_info;
        
    }
}
  In this, simply a HTTP Connection was established and text extracted using "element_name".text() from inside the specific tags using identifiers like classes or ids. The tags from which the information was to be extracted were identified after exploring the web page’s HTML source code. The result thus obtained is in the form of a JSON Object
{
  "Wordpress blog: https://loklaknet.wordpress.com/": [
    {
      "posted_on": "June 19, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "shivenmian",
      "title": "loklak_depot \u2013 The Beginning: Accounts (Part 3)",
      "content": "So this is my third post in this five part series on loklak_depo... As always, feedback is duly welcome."
    },
    {
      "posted_on": "June 19, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "sopankhosla",
      "title": "Creating a Loklak App!",
      "content": "Hello everyone! Today I will be shifting from course a...ore info refer to the full documentation here. Happy Coding!!!"
    },
    {
      "posted_on": "June 17, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "leonmakk",
      "title": "Loklak Walls Manual Moderation \u2013 tweet storage",
      "content": "Loklak walls are going to....Stay tuned for more updates on this new feature of loklak walls!"
    },
    {
      "posted_on": "June 17, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "Robert",
      "title": "Under the hood: Authentication (login)",
      "content": "In the second post of .....key login is ready."
    },
    {
      "posted_on": "June 17, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "jigyasa",
      "title": "Loklak gives some hackernews now !",
      "content": "It's been befittingly said  \u... Also, Stay tuned for more posts on data crawling and parsing for Loklak. Feedback and Suggestions welcome"
    },
    {
      "posted_on": "June 16, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "Damini",
      "title": "Does tweets have emotions?",
      "content": "Tweets do intend some kind o...t of features: classify(feat1,\u2026,featN) = argmax(P(cat)*PROD(P(featI|cat)"
    },
    {
      "posted_on": "June 15, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "sudheesh001",
      "title": "Dockerize the loklak server and publish docker images to IBM Containers on Bluemix Cloud",
      "content": "Docker is an open source...nd to create and deploy instantly as well as scale on demand."
    }
  ]
}
  The next step now would include "writeToBackend"-ing and then parsing the JSONObject as desired. Feel free to ask questions regarding the above code snippet, shall be happy to assist. Feedback and Suggestions welcome :)

Loklak gives some hackernews now !

It’s been befittingly said  “Well, news is anything that’s interesting, that relates to what’s happening in the world, what’s happening in areas of the culture that would be of interest to your audience.” … 787 more words

jigyasa reblogged this on Being Curious .... and commented:

banner-gsoc2016_2

As Google Summer of Code 2016 has officially begun, I am all excited to be working with FOSSASIA yet again. This time, I have been assigned the project Loklak where I shall be spending the summer working on implementing indexing (harvester/scrapers) for different services like weibo.com, angel.co, meetup.com, Instagram etc.

Loklak is a server application which is able to collect messages from various sources, including Twitter. This server contains a search index and a peer-to-peer index sharing interface.

If one likes to be anonymous when searching things, want to archive Tweets or messages about specific topics and if you are looking for a tool to create statistics about Tweet topics, then Loklak is the best option to consider.

loklak_anonymous.png



LOKLAK GIVES SOME HACKERNEWS NOW !

It's been befittingly said  "Well, news is anything that's interesting, that relates to what's happening in the world, what's happening in areas of the culture that would be of interest to your audience." by Kurt Loder, the famous American Journalist. And what better than Hackernews : news.ycombinator.com for the tech community. It helps community by showing the important and latest buzz and sort them by popularity and their links. Screenshot from 2016-06-17 08:01:42 LOKLAK next tried to include this important piece of information in its server by collecting data from this source. Instead of the usual scraping of HTML Pages we had been doing for other sources before, we have tried to read the RSS stream instead. Simply put, RSS (Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video. A standard XML file format ensures compatibility with many different machines/programs. RSS feeds also benefit users who want to receive timely updates from favorite websites or to aggregate data from many sites without signing-in and all. Hackernews RSS Feed can be fetched via the URL https://news.ycombinator.com/rss and looks something like... Screenshot from 2016-06-17 09:33:32 In order to keep things simple, I decided to use the ROME Framework to make a RSS Reader for Hackernews for Loklak.
Just for a quick introduction, ROME is a Java framework for RSS and Atom feeds. It's open source and licensed under the Apache 2.0 license. ROME includes a set of parsers and generators for the various flavors of syndication feeds, as well as converters to convert from one format to another. The parsers can give you back Java objects that are either specific for the format you want to work with, or a generic normalized SyndFeed class that lets you work on with the data without bothering about the incoming or outgoing feed type.
So, I made a function hackernewsRSSReader which basically returns us a JSONObject of JSONArray "Hackernews RSS Feed[]" having JSONObjects each of which represents a 'news headline' from the source. The structure of the JSONObject result obtained is something like:
{
   "Hackernews RSS Feed":[
      {
         "Description":"SyndContentImpl.value=....",
         "Updated-Date":"null",
         "Link":"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.241103",
         "RSS Feed":"https://news.ycombinator.com/rss",
         "Published-Date":"Wed Jun 15 13:30:33 EDT 2016",
         "Hash-Code":"1365366114",
         "Title":"Second Gravitational Wave Detected at LIGO",
         "URI":"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.241103"
      },
     ......
      {
         "Description":"SyndContentImpl.value=....",
         "Updated-Date":"null",
         "Link":"http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec20.pdf",
         "RSS Feed":"https://news.ycombinator.com/rss",
         "Published-Date":"Wed Jun 15 08:37:36 EDT 2016",
         "Hash-Code":"1649214835",
         "Title":"Intro to Hidden Markov Models (2010) [pdf]",
         "URI":"http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec20.pdf"
      }
   ]
}
It includes information like Title, Link, HashCode, Published Date, Updated Date, URI and the Description of each "news headline". The next step after extracting information is to write it to the back-end and then retrieve it whenever required and display it in the desired format as suitable to the Loklak Web Client after parsing it. It requires JDOM and ROME jars to be configured into the build path before proceeding with implementation of the RSS Reader. A look through the code for the HackernewsRSSReader.java :
/**
 *  Hacker News RSS Reader
 *  By Jigyasa Grover, @jig08
 **/

package org.loklak.harvester;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import org.json.JSONArray;
import org.json.JSONObject;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;

public class HackernewsRSSReader {  
    
    /*
     * For HackernewsRSS, simply pass URL: https://news.ycombinator.com/rss 
     * in the function to obtain a corresponding JSON
     */
    @SuppressWarnings({ "unchecked", "static-access" })
    public static JSONObject hackernewsRSSReader(String url){
         
            URL feedUrl = null;
            try {
                feedUrl = new URL(url);
            } catch (MalformedURLException e) {
                e.printStackTrace();
            }
            
            SyndFeedInput input = new SyndFeedInput();
            
            SyndFeed feed = null;
            try {
                feed = input.build(new XmlReader(feedUrl));
            } catch (Exception e) {
                e.printStackTrace();
            }
            
            String[][] result = new String[100][7];
            //result[][0] = Title
            //result[][1] = Link
            //result[][2] = URI
            //result[][3] = Hash Code
            //result[][4] = PublishedDate
            //result[][5] = Updated Date
            //result[][6] = Description
            
            @SuppressWarnings("unused")
            int totalEntries = 0;
            int i = 0;
            
            JSONArray jsonArray = new JSONArray();
            
            for (SyndEntry entry : (List)feed.getEntries()) {
                
                result[i][0] = entry.getTitle().toString();
                result[i][1] = entry.getLink().toString();
                result[i][2] = entry.getUri().toString();
                result[i][3] = Integer.toString(entry.hashCode()); 
                result[i][4] = entry.getPublishedDate().toString();
                result[i][5] = ( (entry.getUpdatedDate() == null) ? ("null") : (entry.getUpdatedDate().toString()) );
                result[i][6] = entry.getDescription().toString();
                
                JSONObject jsonObject = new JSONObject();

                jsonObject.put("RSS Feed", url);
                jsonObject.put("Title", result[i][0]);
                jsonObject.put("Link", result[i][1]);
                jsonObject.put("URI", result[i][2]);
                jsonObject.put("Hash-Code", result[i][3]);
                jsonObject.put("Published-Date", result[i][4]);
                jsonObject.put("Updated-Date", result[i][5]);
                jsonObject.put("Description", result[i][6]);
                
                jsonArray.put(i, jsonObject);
                
                i++;
            }
            
            totalEntries = i;
            
        JSONObject rssFeed = new JSONObject();
        rssFeed.put("Hackernews RSS Feed", jsonArray);
        System.out.println(rssFeed);
        return rssFeed;
        
    }

}
Feel free to ask questions regarding the above code snippet. Also, Stay tuned for more posts on data crawling and parsing for Loklak. Feedback and Suggestions welcome :)