Science → Big Data Partnership → Unlock Value from Complex Data

Category Archive for: Science

Can Big Data really identify the best treatment for people with Type II Diabetes?

We’ve just won funding matching from the UK government’s InnovateUK 2014 programme with partners Outcome Based Healthcare to answer that question. This is a fascinating, challenging  but very valuable machine-learning project, because currently, treating type 2 diabetes and complications caused by it eats up over 6% of NHS budget. And a lot of that cost – and the human cost –…

Read More →

datascience-e1343898868603

Big Data and Real Time Analytics

The advances in big data technology are opening up new ways to collect and transport large amounts of data more efficiently. This revolution has boosted research and development of real-time algorithms and methods. Traditionally, machine learning algorithms were not designed for real-time processing. In fact, data science competitions (e.g the Netflix prize, Kaggle) were criticised…

Read More →

Yarn

“Introducing YARN�? – Hadoop No More a Baby Elephant

With the increasing popularity and the addiction of companies towards Hadoop, also Hadoop being an unanimous solution for Big data platforms makes the Hadoop development team to focus on the current architectural deficiencies and make Hadoop free from such underlying architectural issues. In that path a new Hadoop MapReduce version has taken birth MapReduce 2.0…

Read More →

Map Side and Reduce Side Joins

Joins:- ======= Joins is one of the interesting features available in MapReduce. Joins performed by Mapper are called as Map-side Joins. Joins performed by Reducer can be treated as Reduce-side joins. Frameworks like Pig, Hive, or Cascading has support for performing joins. Before diving into the implementation let us understand the problem throughly. If we…

Read More →

Bloom Filter Vs Feature Hashing

Bloom Filter A Bloom filter is a space-efficient probabilistic data structure that is used to efficiently encode sets and perform set membership tests, whether an element is a member of a set. False positives are possible, but false negatives are strictly not possible. i.e. a query returns either “inside set (may be wrong)�? or “definitely…

Read More →

Clustering with Mahout

Clustering Introduction:- Clustering is one of the most popular techniques available in Machine learning field. This allows the system to group numurous entities into separate clusters/groups based on certain characteristics/features of the entities. Clustering is a widely used technique in many grouping problems like grouping similar news articles, blogs, emails, malwares etc based on their…

Read More →

Recommending from big data

As the research on core recommender systems progresses and matures, it becomes clear that a fundamental issue for these algorithms is to determine how to embed the core techniques in real operational systems and how to deal with massive and dynamic sets of data. Recommender system algorithms are very effective in identifying and predicting user preferences based on explicit or implicit indication of preference that…

Read More →

Back to Top