Big Data and Advanced Analytics

Big Data and Advanced Analytics

Big Data and Advanced Analytics 150 150 Peter

Big Data and Advanced Analytics

Assignment 3:
This assignment consist of two parts.

Part A
1. Access the Konstanz Information Miner tool at the KNIME web site
2. Review the information about the KNIME Analytics Platform.
3. Answer the following questions:
1. How could this platform be used to analyze health care data?
2. What benefits does this software have in comparison to commercial products that have similar functionality and use?

Part B
1. Access the journal article by Raghupathi, W., Raghupathi, V., (2014) Big data analytics in healthcare: promise and potential. Health Information Science and Systems 2(3), 1-10
2. Read the article and define the 4 ‘Vs’ of big data analytics in health care and give an example of each.
Combine your findings writing a 5 page paper using strict APA 7 format.

Sample Paper

Big Data and Advanced Analytics

KNIME analytics platform is an open-source software that allows users to develop data flows visually, execute some or all analyses, and inspect the results using interactive widgets and views. This platform does not contain paywall and locked features, therefore allowing the user to execute operations efficiently. It employs the method of modular data pipelining and applies Java Database Connectivity while providing an interactive visual component for easy node assemblies that links the diverse source of data (KNIME, n.d). The software has several modules, a broad spectrum of integrated tools, and advanced algorithms. Its key elements include data blending, significant data extensions, robust analytics, meta-node linking, local automation, and workflow difference (KNIME, n.d). The essay discusses how this software can be applied to evaluate health care data, its strengths over other similar commercial products, and the features of big data analytics in healthcare.

The KNIME analytics platform can be utilized to analyze healthcare data such as gene expression data. Healthcare professionals analyze gene expression data to find the relationship between genes and diseases. This software provides many ways to access and analyze healthcare data, including classification, clustering, regression, and dimension reduction (KNIME, 2020). There are various steps involved in gene expression analysis when using this software. The first step includes transcription. Here, the user looks for data (RNA sequence data) from various relevant files. RNA sequence data from tumors and matched normal tissue from patients are then analyzed to find the differentially expressed genes based on statistics from the gene expression analysis. The software allows the user to create an interactive compositive view to select genes for further examination. After that, the researcher or user will cluster the genes based on comparable expression profiles and explore their biological pathways. The software comprises the Hierarchical Clustering and Heatmap, which allows the users to cluster the data (KNIME, 2020). In the last step, the researcher will look for compounds targeting the selected gene products. In this case, the KNIME analytics platform provides nodes that will allow the researcher to retrieve information for all compounds.

The KNIME analytics platform has various benefits compared to other software that has comparable functionality and application. First, ease of use and quick learning are essential features of an excellent graphical data tool for analysis in today’s adaptive work environments. KNIME analytics platform meets this need compared to other similar functions such as RapidMiner since it is easy to use and has a quick learning curve. KNIME analytics platform has an intuitive user interface that allows the user to speed up the learning curve. This interface makes everything relatively easy to use (KNIME, n.d). On the other hand, RapidMiner’s setup and upgrade processes are not as easy as the KNIME analytics platform. Its initial setup is time-consuming and complex, undermining its usability compared to the KNIME analytics platform (KNIME, n.d). Secondly, it is a data analytics solution that is designed to be self-service. It can generate, blend, and analyze data quickly before deploying results in a concise period, just like Alteryx. But, the KNIME analytics platform has various strengths over this system, including better visualization. Alteryx features have poor visualization capabilities, and therefore many users can prefer the KNIME analytics platform of this system (KNIME, n.d). Thirdly, this data analytics tool is an open-source platform. There are many good data analytics tools that have poor support, such as MatLab. Thus, this makes open-source platforms such as KNIME shine through because it is very easy to get support for this feature. KNIME analytics platform releases continous updates and modules to enable the users to learn more about the system and use its easily (KNIME, n.d). Lack of online support forums can lead to huge setbacks and delays when the user experiences issues that require help. The fourth strength of the KNIME analytics platform is scalability. This platform has numerous extensions, allowing users to customize it in size or scale to fit their particular needs. For instance, this software has local automation capability. This offers the users to deploy it to all scales of their production environment in a flexible manner. Also, its workflow difference functionality enhances flexibility in organizations by allowing the users to track all modifications made on their workflows, those made by coworkers, and ensuring security against unintentional changes. Fifthly, the software integrates seamlessly with diverse open source projects, including machine learning algorithms such as Keras, ScikitiLearn, and H2O for deep learning (KNIME, n.d). Besides, the KNIME analytics platform supports multiple web-based reporting methods and providers nodes that allow users to run various code scripts such as Java, Python, and Perl. Lastly, the KNIME analytics platform can operate on multiple operating systems like windows 32-bit version with XP and Vista, window 64 with Vista and window 7, multiple Linux systems, and many others (KNIME, n.d).

4 V’s of Big Data Analytics

Big data in healthcare involves the data sets generated in healthcare settings that are so large and complex in a manner they are not easy or impractical to handle using traditional software/hardware or even with traditional or standard data management tools and applications, not only because of its vast volume but also its diversity and the rate at which it should be managed. Big data in healthcare has four critical features referred to 4 “Vs.” One of them is volume. This is the size of the data sets that need to be analyzed and processed (Raghupathi & Raghupathi, 2014). Health-related data will be generated and amassed progressively, leading to a large volume of data. Examples of healthcare data already existing in large volumes include personal medical records, human genetics, clinical trial data, FDA submissions, and radiology images. Besides, there are new types of big data, which include genomics and 3D imaging, are also making the volume of big data increase exponentially (Raghupathi & Raghupathi, 2014).

The second characteristic is velocity. This is defined as the rate at which the data sets are produced, processed, and moved or accessed. A massive amount of data is created and amassed in real-time and at an incredible speed every day. The steady flow of the new information stored at high rates poses unique problems in healthcare (Raghupathi & Raghupathi, 2014). Moreover, the establishment and utilization of various technologies such as the internet, machine learning, genomic testing, natural language processing, and other techniques have increased the velocity of big data in healthcare. This is necessary for some healthcare settings, including the ICU, where patient vital signs must update in real-time at the point of care to enhance decision making and efficient healthcare delivery.

The third characteristic is variety. The healthcare system comprises diverse sections that generate several forms of data in different volumes and velocities. This brings inthe element of variety. For instance, various forms of structured data are generated in healthcare, such as patient name, physician’s name, hospital name, date of birth, and address. This form of data can be stored, recalled, evaluated, and manipulated easily using a machine. Also, diverse forms of unstructured data are generated at the point of care. They include handwritten doctor notes, paper prescriptions, MRI and CT images (Raghupathi & Raghupathi, 2014). Besides, various new data streams are flowing into the healthcare industry from other sources like genetics and genomics, social media research, and mHealth devices.

The other characteristic is veracity. This refers to the quality of the data that is being analyzed. Therefore, it focuses on the credibility and reliability of big data, analytics, and outcomes. Data quality is a significant issue in healthcare. This is because of various reasons. One of them is that some healthcare data, such as unstructured data, is highly inconsistent and erroneous. Also, healthcare professionals make multiple decisions, including life or death decisions which are dependent on accurate information (Raghupathi & Raghupathi, 2014). Healthcare professionals struggle each day to boost the data quality and data integrity levels (Raghupathi & Raghupathi, 2014). They utilize various strategies, including data governance, to achieve this objective.

In summary, the adoption of digital technologies in healthcare to collect patients’ records and help in  managing hospital performance has led to massive of information in the industry. This has posed new challenges including analysis of the big data to uncover market trends, hidden patterns, and correlations. Big data has four key features which include volume, velocity, veracity, and variety. As a result, various digital technologies, such as KNIME Analytics platform, has been introduced to facilitate big data analysis. This software has various strenghts. It is easy to use, has a quick learning curve, operate on multiple operating systems, it is an open-source platform and also integrates seamlessly with diverse open source projects.



KNIME (n.d). KNIME Analytics Platform. Retrieved from

KNIME. (2020). Gene Expression Data Analysis with KNIME Analytics Platform. Retrieved from

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health information science and systems2(1), 1-10.