# Enn543 Data Analytics And Optimisation|Dimension Assessment Answer

Problem 1. Clustering. Bike share systems are becoming increasingly common in cities across the world, but their usage is highly variable and depends on factors such as local weather.

You have been provided with two months data from the New York Bike Share system covering one month in summer (Q1/JC-201707-citibike-tripdata.csv) and one month in winter (Q1/JC-201801-citibike-tripdata.csv). From the size of the files alone it is clearly evident that there are substantially fewer trips in winter than there are in summer, however it it unclear if the actual pattern of use (i.e. the typical types of trips) is different. Using this data and the clustering method of your choice, you are to attempt to answer the question: ‘aside from the overall number of trips, do usage patterns change from from summer to winter?’. In doing this you should cluster the data using the following five dimensions:

1. start station latitude;

2. start station longitude;

3. end station latitude;

4. end station longitude;

5. tripduration.

Note that this means that clusters will contain 5 dimensions, and visualisation of clusters in a single 2D plot will not be possible.

Your answer should demonstrate and discuss how usage patterns are similar or dissimilar (depending on what you find), and should also consider different time periods (morning, afternoon, etc) to better explore how the service is used.

Your answer should explain all decisions made when conducting the analysis, including details such as:

• the clustering method selected;

• any parameters that are required for the clustering;

• any outlier removal that is conducted; and

• any data normalisation or scaling that is performed.

Problem 2. Classification. Software systems are complex, and errors in deployed software can be very costly and difficult to correct. In an effort to help detect faulty software,

a number of metrics have been proposed that measure software complexity.

You have been provided with data (Q2/pc1.csv) which contains various code metrics for a number of software examples, as well as a flag to indicate if the software contains a fault or not. For clarity:

• The first 21 columns contain predictors that measure some aspect of the software complexity, and may be used to determine if software is faulty or not;

• The last column contains a value of true or false, indicating if the software has a defect or not.

Using this data, you are to train a support vector machine (SVM) to separate defective software from error free software. You are to report on the accuracy of the developed model, and on any problems or challenges that you encounter in developing the model. In doing

this you should:

1. Divide the data into appropriate training, validation and testing datasets;

2. Consider what SVM parameters (box constraint, kernel type, etc.) you should use;

3. Consider the class distribution of the data, and make allowances within the model as needed.

Please note that allowing MATLAB to optimise hyper-parameters in place of properly investigating parameter settings is not acceptable as a justification for hyper-parameter selection, though a grid search (which is a more systematic approach) will be accepted. Your answer should explain the choice of parameters in the final model, and discuss it’s performance.

Problem 3. Dimension Reduction and Classification. Recognising content in images can be a challenging problem due to the high dimensional nature of the input data. As such, dimension reduction methods can be used to reduce a problem space and make tasks more computationally feasible. You have been provided with data (Q3/shvn test.mat) that shows images of single digits (0, 1, 2, 3, 4, 5, 6, 7, 8 and 9) of house numbers, extracted from Google street view data.

Using this data you are to train classifiers (the type of classifier is up to you) to classify the observed digit in the image. Prior to classification, you are to reduce the data using:

1. PCA;

2. LDA;

i.e. you should train two classifiers: one using data reduced using PCA, one using data reduced using LDA. You are then to evaluate the two classifiers and compare their performance. In completing this question you should:

1. Divide the data into appropriate training, validation and testing datasets;

2. Consider what type of classifier to use;

3. Determine what an appropriate amount of dimensions to retain is.

Also note that due to memory constraints, it may not be possible to train the PCA or LDA space on all samples, and you may need to use only a subset of the data to compute the PCA and LDA transforms.

Your answer should explain the choice of any parameters and choices made (type of classifier, number of dimensions retained, etc) in arriving at your solution, and discuss the performance of the two methods, relating this what the two transforms (PCA and LDA) are seeking to achieve

## Buy Enn543 Data Analytics And Optimisation|Dimension Assessment Answer Online

Talk to our expert to get the help with Enn543 Data Analytics And Optimisation|Dimension Assessment Answers to complete your assessment on time and boost your grades now

The main aim/motive of the management assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignments. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks.Â The experts of the assignment help services at urgenthomework.com are so much skilled, capable, talented, and experienced in their field of programming homework help writing assignments, so, for this, they can effectively write the best economics assignment help services.

### Get Online Support for Enn543 Data Analytics And Optimisation|Dimension Assessment Answer Assignment Help Online

#### Resources

- 24 x 7 Availability.
- Trained and Certified Experts.
- Deadline Guaranteed.
- Plagiarism Free.
- Privacy Guaranteed.
- Free download.
- Online help for all project.
- Homework Help Services