Web crawler as a service : 80legs – Good or bad?

80legs Web crawler as a service : 80legs   Good or bad?

300px WebCrawlerArchitecture.svg Web crawler as a service : 80legs   Good or bad?
Image via Wikipedia

Web crawler or a web spider is an automated program or bot that browses World Wide Web in a methodical manner and picks up and store information.  Mostly search engines use this web crawler for their indexing and this process is called web crawling or spidering.

Web crawlers are mostly used for search engine indexing (crawling and caching of web pages for later analysis), automated maintenance job for websites such as checking links or validating HTML and even sometimes web crawler is used for harvesting email ids from web pages for specific use of email spam.

So web crawler has it’s specific advantages as well as disadvantages. As of now , companies or individuals are using custom web crawler for their specific use. Spammers are also using it for automated comment posting and harvesting email ids. But 80legs brings web crawler as a service in a new way.

The concept is new but the thing was quite known to everybody. 80Legs is a web crawling and online content analysis service.  This is the most powerful web crawler ever and now everybody can access it for a detail data analysis and data research purpose. 80legs also has an application store where you can build your own application as an add-on with the core crawling service for some specific use. 80legs , Web-crawling-as-a-service company will put up for sale tens of millions of data points extracted from public social networks and other websites.

Here the debate comes. The most discussed topic of user privacy. CEO Shion Deysarkar describes his interest  “just give me enough data, I’ll figure out something. And often enough I did figure out something.”  The same concept he has applied on 80legs to figure out something relevant and decisive from huge pile of data. Definitely it’s a good effort but the question is how to prevent it from black marketers who want to build up application on 80legs or use 80legs web crawler service to steal personal profile data from social sites and to misuse them just for sending spam mails or for sending bulk sms to users?

I am not sure if 80legs has any privacy policy or legal agreement or any other user privacy protection technique clearly specified or not. I have gone all through the web and got mixed reaction on this web crawler as a service.

According to ReadWriteWeb, “Starting at $350 per month, 80Legs customers can now purchase 10 to 20 million monthly user profiles from LinkedIn, MySpace and some other social networks. Facebook and Twitter are not included, but there are a variety of other data sets from places like retail websites available as well.

I’ve bet Deysarkar a beer that LinkedIn isn’t going to put up with this, but he says 80Legs has been crawling them extensively for quite a while and would have stopped them if they wanted to. We’ll see.”

80legs defends themselves “The people using that data are not doing anything that’s shady or wrong. They are trying to make new value on top of that data. In ways that Facebook or whoever is not doing. Facebook is in the business of bringing people to their site, they aren’t leveraging that data for other things, and there is many things they’ll never use data for. No harm is being done to Facebook. What would help them would be to become a data standard. As long as people are adding value then it’s good.”

But people are still not taking it well. There are mixed reaction. Data aggregation by means of sophisticated web crawler service has been presumed as a privacy violation. But CEO Shion Deysarkar is optimmistic.

“”Going forward, the end user will hopefully understand that people are creating services that will benefit them. If I take a couple of actions and I see it benefits me that’s hopeful. The challenge is that people have to understand that it came from aggregation. The more people that are making a case and building things around it, the better.”

and he also adds “At the end of the day, it’s going to happen. Sites are going to fight it, but that data is going to become available. Wherever there is value to be had, people are going to go for that value”

Here is another question on 80legs security. How do you  deal with the problem of running code on other people’s machines? How do you know that the data being sent back is valid, or that a competitor can’t start a node and reverse-engineer your code?

Answer by 80legs – We’ve actually done a lot of work on this to ensure that there isn’t a problem with running the code on various people’s machines. We do have a reasonably sophisticated validation process as well. For someone to change results from one node, they would have to do quite a bit of work.

Big data, pattern recognition, trends analysis, social science, privacy violation are the flying words related to 80legs now. 80legs big data can help researcher to find sentiment analysis or behavioral analysis of specific group of users but there is a very thin line of what is public, what is private and who is crawling data and for what purpose. Users can’t get the details.  So in one hand, 80legs is an excellent service with huge potential and in another hand, unless users are aware of what is private data and what is public data, the debate will go on.  We have to wait. The question will be still there :  80legs : web crawler as a service – good or bad?

What do you say?

80Legs at Defrag 2009: “Making the entire web semantic”

(9Y8AHWCKUFKR)

80Legs at Defrag 2009: “Making the entire web semantic”

post [with Zemanta]" href="http://reblog.zemanta.com/zemified/2c945f0b-a097-46a3-866f-1f9fd67aeb4e/"> Web crawler as a service : 80legs   Good or bad?

Users who found this post were searching for:

кто такой 80legs

No related posts.

About Debajyoti

Debajyoti Banerjee is the Founder and CEO of Seven Boats Info-System Pvt. Ltd. - A full service digital marketing company in India. Debajyoti is an experienced SEO expert and Internet Marketing consultant from Kolkata, India. As a web entrepreneur and online marketing strategist, he is helping many companies to transform their businesses to the next level of success by 1) making their websites convenient, interactive and immediate; 2) helping them to reduce cost and increase efficiency; 3) making their websites more flexible and 4) helping them to get more global customers.

Comments

  1. Hello,
    Trying to get my sites up on all the search engines. Any good feed would be appreciated.
    Thanks

    • Debajyoti says:

      Thanks Donald for your comment. Don’t try too hard to submit your site to thousand of search engines, rather focus more on writing good quality content and to attract loyal visitors, all search engines would find your site by themselves then.

Speak Your Mind

*

CommentLuv badge