Exploring Data Using Graphics and Visualization
In this you will be application the agitate data: churn_data.txt
Read abstracts into a abstracts anatomy application the action read.csv() with the afterward options:
Assume that you adored the book churn_data.txt in C:/Datasets folder. Again you can apprehend book into a abstracts anatomy as follows:
churnData=read.csv(file, stringsAsFactors = FALSE,header = TRUE)
A) Book the name of the columns.
Hint: colnames() function.
B) Book the cardinal of rows and columns
C) Count the cardinal calls per state.
Hint: table() function.
D) Acquisition mean, median,standard deviation, and about-face of nightly charges, the cavalcade Night.Charge in the data.
The R functions to be acclimated are mean(), median(), sd(), var().
E) Acquisition best and minimum ethics of all-embracing accuse (Intl.Charge), chump annual calls (CustServ.Calls), and circadian charges(Day.Charge).
F) Use summary() action to book advice about the administration of the afterward features:
"Eve.Charge" "Night.Mins" "Night.Calls" "Night.Charge" "Intl.Mins" "Intl.Calls"
What are the min and max ethics printed by the summary() action for these features?
Check arbiter folio 34 for a sample.
G) Use unique() action to book the audible ethics of the afterward columns:
State, Area.Code, and Churn.
H) Extract the subset of abstracts for the aerated customers(i.e., Churn=True). How abounding rows are in the subset?
Hint: Use subset() function. Check address addendum and arbiter for samples.
I) Extract the subset of abstracts for barter that fabricated added than 3 chump annual calls(CustServ.Calls). How abounding rows are in the subset?
J) Extract the subset of aerated barter with no all-embracing plan (Int.l,Plan) and no articulation mail plan (VMail.Plan). How abounding rows are in the subset?
K) Extract the abstracts for barter from California (i.e., Accompaniment is CA) who did not agitate but fabricated added than 2 chump annual calls.
L) What is the beggarly of chump annual calls for the barter that did not agitate (i.e., Churn=False)?
question2 accompanying to above
In this ,we will analyze the agitate abstracts application cartoon and visualization. One of the primary affidavit for assuming basal abstracts assay (EDA) is to investigate the variables, appraise the distributions of the absolute variables, attending at the histograms of the numeric variables, and analyze the relationships amid sets of variables.
Although we are not activity to advance any models for this project, in a real-world activity our appointment is to analyze patterns in the abstracts that will advice to abate the admeasurement of churners.
We will use the aforementioned abstracts set we had in Week 2 assignment:
Data file: churn_data.txt
All cartoon in this appointment accept to be advised application ggplot2 library. So, you charge to install ggplot2 library for graphs:
Before application any methods from the libraries, you charge to amount these libraries into the R cipher using
Here is how you can apprehend abstracts into a abstracts anatomy called churnData:
churnData <- read.csv(filePath, stringsAsFactors = FALSE,header = TRUE)
where filePath is the breadth of the churn_data.txt file. For example, if you adored book in C:/tmp, again you should use C:/tmp/churn_data.txt
The variables in the book churn_data.txt are
State: Categorical, for the 50 states and the District of Columbia.
Account length: Integer-valued, how continued annual has been active.
Area code: Categorical
Phone number: Essentially a agent for chump ID.
International plan: Dichotomous categorical, yes or no.
Voice mail plan: Dichotomous categorical, yes or no.
Number of articulation mail messages: Integer-valued.
Total day minutes: Continuous, account chump acclimated annual during the day.
Total day calls: Integer-valued.
Total day charge: Continuous, conceivably based on aloft two variables.
Total eve minutes: Continuous, account chump acclimated annual during the evening.
Total eve calls: Integer-valued.
Total eve charge: Continuous, conceivably based on aloft two variables.
Total night minutes: Continuous, account chump acclimated annual during the night.
Total night calls: Integer-valued.
Total night charge: Continuous, conceivably based on aloft two variables.
Total all-embracing minutes: Continuous, account chump acclimated annual to make
Total all-embracing calls: Integer-valued.
Total all-embracing charge: Continuous, conceivably based on aloft two variables.
Number of calls to chump service: Integer-valued.
Churn: Target. Indicator of whether the chump has larboard the aggregation (true or false).
Part 1. Bar Charts
A bar blueprint is a histogram for detached data: it annal the abundance of every amount of a absolute variable.
1.) Vertical Bar Charts
Plot the bar archive of State, Area.Code, Int.l.Plan, VMail.Plan, CustServ.Calls, and Churn.
Use the theme() action to change the argument size, location, color, etc.. (An archetype is accustomed in the arbiter on folio 61)
The afterward is the bar blueprint for State. As an example, the x-axis labels are bold, and rotated 90 degrees which can be set in the theme() action using
axis.text.x = element_text(face="bold",angle=90,vjust=0.5, size=11).
Similarly, the constant colour="#990000" is acclimated for the blush of the x-axis title. So, the afterward options for axis.title.x and axis.text.x in theme() action affectation the appellation and argument of x-axis as apparent in the amount below:
axis.title.x = element_text(face="bold", colour="#990000", size=12), axis.text.x = element_text(face="bold",angle=90,vjust=0.5, size=11)
2.) Accumbent Bar Charts
Create the accumbent bar blueprint of CustServ.Calls.
Hint: Arbiter folio 49.
3.) Accumbent Bar Archive with Sorted Categories
Create accumbent bar blueprint breadth the cardinal of calls are sorted for CustServ.Calls.
Hint: Arbiter pages 50-51
Part 2: Histograms and Body Plots
The histogram and the body artifice are two visualizations that advice you bound appraise the administration of a after variable.
A basal histogram bins a capricious into fixed-width buckets and allotment the cardinal of abstracts credibility that avalanche into anniversary bucket. You can anticipate of a body artifice as a “continuous histogram” of a variable, except the breadth beneath the body artifice is according to 1.
1.) Artifice the histograms of Account.Length, VMail.Message, Day.Mins, Intl.Calls, and VMail.Message.
Based on the histograms, animadversion on whether any of them accept outliers, abutting to the Normal Distribution, multi-modal, or skewed.
The histogram for Account.Length is apparent below:
2.) Artifice the body plots of Account.Length, VMail.Message, Day.Mins, Intl.Calls, and VMail.Message.
Based on the body plots, animadversion on whether any of them accept outliers, abutting to the Normal Distribution, multi-modal, or skewed.
As a sample, the body artifice for VMail.Message is apparent below:
Part 3. Besprinkle Plots
In accession to analytical variables in isolation, you’ll generally appetite to attending at the accord amid two variables.
Plot the besprinkle plots for pairs Eve.Mins - Day.Mins, Day.Mins-Day.Charge, Eve.Mins-Eve.Charge, Day.Mins-Day.Calls.
Based on the plots, are there any relationships amid the brace of appearance plotted?
The besprinkle artifice of Eve.Mins vs Day.Mins is accustomed below:
For the besprinkle plots in allotment A, add color to affectation agitate and no-churn abstracts points. Simply add aes(color=Churn) to the geom_point() action as apparent below:
Part 4. Box Plots
A box-and-whiskers artifice describes the administration of a connected capricious by acute its five-number summary: the minimum, lower quartile (25th percentile), average (50th percentile), high quartile (75th percentile), and maximum.
Plot the box plots of CustServ.Calls, Night.Calls, and Intl.Charge by Churn.
Which of the appearance accept outliers? (can you atom them in the box plot?)
What is the average of Night.Calls for barter that did not churn? (from the box plot)
The afterward is the box artifice of CustServ.Calls.
Hint:You can acquisition abundant advice and samples of box artifice at
Part 5. Dodged and Ample Bar Charts
A) Affectation a dodged bar blueprint of Int.l.Plan by Churn.
Hint: Arbiter pages 60-61.
B) Affectation a ample bar blueprint of CustServ.Calls and Churn.
Order a unique copy of this paper