My proposal

1) Description of the data
a. Using the website espn.com, I would like to look at different baseball statistics over the last few years. Specifically, I would like to take a look at the number of homeruns hit by players each year between 2003 and 2007 for the top three paid batters for each team.

2) Why is this data interesting
a. This data is interesting because many players are paid an inordinate amount of money and provide little results while some players are paid little and provide many home runs for their teams. This data would provide fans, players, and teams with information about getting all they can for their money

3) How can we obtain this information
a. This data is rather easy to obtain whether from a google search for each of the teams over the last few years or just by going to espn.com and spending time looking at each team and player playing for the last five years of baseball.

4) What questions will this data answer?
a. Is a team getting the best results for the amount of money they are paying these players?
b. Who is the most economical player (has the highest homeruns to pay ratio)?
c. Who is the least economical player?
d. What team is getting the best deal (three players homeruns to pay ratio is the highest)?
e. What team is getting the worst deal?

Collin’s Movie Monologue page

What is this data?

1. “Collin’s Movie Monologue page” has hundreds of monologues all from films and hit television series. Created in 2006, by Colin L. Ryono it displays an alphabetized list of films and the monologues from them in an easily accessible page.

W Why is it interesting?

2. .It is especially interesting to me because I am an actress (and is to all actors/actresses) because there are so many times when you need a monologue fast, and on this site it is very easy to find. However, this site is interesting and useful to the average person as well, because allot of times in movies, you can misunderstand something or want to repeat it and with this site you can look it up.

How do you obtain the information?

To enter this website just enter http://www.whysanity.net/monos/ in your address toolbar. If you run into trouble with this however, “Colins Movie Monolougue page” is the first link on the google results page if you go to google search and type in “movie monologues”. Once in the site it is easy to find/select the monologue of your choice from the alphabetized list of movies A-Z and clicking on which one you want to get a monologue from. You then click on the specific monologue you’d like to read or print out and you are set to go.

It What questions does it answer? It

an It answers what monologues are from what movie, and gives a word to word account of the monologues.

Description of the Data
My proposal is to compile and compare statistics released by independent publications and services to more traditional and controlled publications and mediums.
I would compare Last.fm’s global charts to those released by Billboard magazine over the course of a one month period to track artists’ weekly rankings in both publications. Because the music industry has changed so dramatically over the past few years, I would also be interested in comparing the iTunes Store statistics of artists sales during the said time period.

How we can obtain the data:

All of this data is readily accessible on existing data statistics. A sample Excel file for doing this is here: music.xls.


http://www.last.fm/music/+charts/track/

http://www.billboard.com/bbcom/charts/chart_display.jsp?g=Singles&f=The+Billboard+Hot+100
http://www.apple.com/euro/itunes/charts/top10songs.html
http://www.pitchforkmedia.com/page/best_new_music

Why this data is interesting:
Tracking five artists, to be selected once the data has been compiled and compared, would provide insight into the listening habits of a wide scope of individuals. Last.fm users, iTunes users, and Billboard charts would provide a spectrum of insight into music trends and the way the nation absorbs the enormous amount of information/music available to the public through radio and internet.

I would then take the rankings and statistics of said 5 artists, determine their sales stats on iTunes.

What specific questions can the data answer:

Who is listening to what?
Where are they listening?
What does the data reveal about listener trends in technology savy users compared to traditional radio spin statistic? What about iTunes downloads? What do independent music reviews have to say about particular artists and does this have any effect on the success of particular artists?

It would be interesting to then investigate the Last.fm user demographics for top listeners of the 5 particular artists and determine who is listening to what, and where in the country. To focus the scope of the project I will focus on US based membership only. The main user info that will be collected for the project is gender, age, and location of users. Once the project evolves into a more focused set of data, it would be interesting to use all of the different statistic information available through Last.fm in regards to top listener rankings for the said artists.

A little background information about Last.fm is included below… [all info taken from wikipedia]

Last.fm is a UK-based internet radio and music community website, founded in 2002. It is the world’s largest social music platform with over 20 million active users based in more than 232 countries. On 30 May 2007, CBS Interactive acquired Last.fm for £140m (US$280m), making Last.fm the largest European Web 2.0 purchase to date.
Using a unique music recommendation system known as “Audioscrobbler”, Last.fm builds a detailed profile of each user’s musical taste by recording details of all the songs the user listens to, either on the streamed radio stations or on the user’s computer or portable music device. This information is transferred to Last.fm’s database (”Scrobbled”) via a plugin installed into the user’s music player. The profile data is displayed on a personal web page. The site offers numerous social networking features and can recommend and play artists similar to the user’s favourites.
Users can create custom radio stations and playlists from any of the audio tracks in Last.fm’s music library, but are not able to listen to individual tracks on demand, or to download tracks unless the rightsholder has previously authorised it. Registration is required to acquire a profile but is not necessary to view any part of the site or to listen to radio stations.

Global charts
Last.fm generates weekly “global” charts of the top 400 artists and tracks listened to by all Last.fm users. To prevent the artificial boosting of an artist or song by deliberately repeated tracks from a single listener, these charts are based on the total number of individual listeners (the reach) and not the number of actual plays.

The result is notably different from traditional commercial music charts provided by the UK Top 40, Billboard magazine, Soundscan and others, which are based on radio plays or sales. Last.fm charts are less volatile and a new album’s release may be reflected in play data for many months or years after it drops out of commercial charts. For example, The Beatles have consistently been a top 5 band at Last.fm, reflecting the continued popularity of the band’s music irrespective of current album sales. In addition the Last.fm charts are much more rock, indie and alternative influenced and less pop-influenced than regular charts.
The main reason behind the differences is that the charts reflect the musical taste of the particular demographic of the service’s users, not that of the general public. Last.fm users generally have an Internet connection, may be more computer-literate than average, and may have wide collections of music from which to choose, due to the ability to download MP3 files from the internet.
The Global Tag Chart shows the 100 most popular tags that have been used to describe artists, albums, and tracks. This is based on the total number of times the tag has been applied by last.fm users since the tagging system was first introduced and does not necessarily reflect the number of users currently listening to any of the related “global tag radio” stations.
For the week ending October 14th 2007, Radiohead broke the last.fm record for both weekly plays and weekly listeners following the release of In Rainbows. Track 15 Step set records for weekly plays and listeners and the ten tracks from In Rainbows made up the weekly top 10, with the lowest charting In Rainbows song having almost three times the number of listeners of the next highest placed track (Stronger by Kanye West, which had itself set a record for number of listeners a few weeks previously). The Radiohead album held the top 10 spots for the four weeks after its release.

a. Describe the data (clear, precise)
I plan on using the webpage aolstalker, which comprises a search engine to look through the AOL data released in 2006. The site show what anonymous AOL users searched for in a given time frame. If I were to conduct this on a larger scale I could use larger search engines such as Google or Yahoo. I will be trying to prove that a large portion of those who search for “Chinese Food” are college students.
b. Why is this data interesting?

The data concerning what types of people search for Chinese food could be used in order to place ads on the campuses webpage such as the ads seen on the schools news paper, The Acorn. Currently there are several Chinese food services that advertize with pamphlets on campus but a detailed report on how many college students search for Chinese food would be extremely helpful for more wide scale marketing campaign. The information gathered could also help college campuses dining contracts more successful by supplying more food that students love. As well as dining contracts the data could also be used to help determine eating habits of college students and examine why the particular food is appealing.

c. How do we obtain this data?
I will obtain the data by going onto aolstalker.com or if I were to conduct this search on a larger scale I could contact Google or Yahoo for search results for Chinese food. While at AOL stalker I can search though data free of charge; however, I would have to pay to get searches from Google or Yahoo. Once I gained the data base of items searched for, I could place in key words such as “Chinese food” and see who searched for that data in the search engine as well as the other items they searched for. The data will be easy to obtain and looking through the search results I can tell what college related sites they went to (eg. Facebook, universities, ect).
d. What questions will the data answer?
I want to be able to prove that college students are a large population of the Chinese food business. I will be able to get this by analyzing search result data from Chinese food and see what other searches they looked for that would relate to colleges and/or Universities.

- Matthew Fingerman (’11)

 

The data I would like to analyze comes from multiple sources. All professional athletes would be added to a database. The leagues they participate in will be noted along with their names. A list of all drug or behavior related league suspensions shall be cross referenced with the name list. Public arrest records will be cross referenced with the names of professional athletes as well.

            The data is interesting because Athletes are role models to many children in America; however some do things unsuitable for a role model.

            The data would come from professional sports leagues and organizations such as the NBA, NFL, MLB, or UFC among others. Data would also be collected from public arrest records from local, state, and the federal government. 

            The data collected would answer the following questions:

·         Have there been more instances of doping or arrests over time?

·         Which professional sports league has the most players committing crimes?

·         Are players in any city more likely to commit a crime?

·         Are players on a particular team more likely to commit a crime?

·         Are players graduating from certain colleges more likely to commit a crime?

·         Which crimes committed by athletes are the most common?

Description of the data: 

Ebay is an online company that allows users to sell products to customers by auctioning them, and to buy them from other users with the same method. With 35 categories including antiques, books, electronics, jewelry, music, tickets, sports memorabilia, and home & garden, and hundreds of sub-categories, the site attempts to cater to every audience. To buy or sell on the site, you must create a user name. Ebay then allows you to buy and sell, storing your 30 day bid history.  Most users both buy and sell.  Anyone selling an item must put their location with the information about the product for shipping reasons. I think it would be interesting to see what individual users are buying, as well as what other items they look for and bid on in the 30 day period.  It would also be interesting to find out what kinds of people are buying and selling from what areas of the world. 

Why is the data interesting?

Buying items, if not selling them, is a major part of a person’s everyday life.  We purchase the things we need, like food and clothes, but also the things we like. Consequently, the things we buy speak to other people about the kind of person we are. Ebay is even more interesting because it puts all of the available items in one easy-to-use interface and it allows for people to buy items from anywhere- from work, home, school, etc. It is also possible to find out where a seller is living.

How can we obtain this data?

When you click on an item that is being auctioned, you are brought to a screen where you can see the number of bids and the high bidder. You can then click on the high bidder, and can view their bidding history.   For example, this user’s profile shows:

Positive Feedback: 100%
Item description: OCCULT Paranormal ASTROLOGY
Bids on this item: 5
30-Day Summary
Total bids: 11
Items bid on: 4
Bid activity (%) with this seller: 45%
Bid retractions: 0
Bid retractions (6 months): 1
Category No. of Bids Seller Last Bid
Books > Antiquarian & Collectible 5 Seller 1 16h
Jewelry & Watches > Fine 3 Seller 2 14h
Jewelry & Watches > Fine 1 Seller 3 4d 2h
Jewelry & Watches > Fine 2 Seller 4 1d 2h

The information about what people are searching and bidding on can be found by searching through the profiles of individual users.

In order to get information about where people who were selling certain items lived, someone would have to go through the items of interest that were currently being auctioned and look at the seller’s profile.

What specific questions can the data answer?

Who is buying what and how often?

What else are the people buying a certain item buying?

If there are any trends with regard to location and the types of items being purchased.

Description of the Data

The purpose of the data set will be to list the cars currently made that have the highest mile per gallon ratings. The data set will also contain baseline cost for each model and advertised MPG vs. actual MPG.

Why the Data is Interesting

The data is interesting because modern day society is highly dependent upon transportation.  With gas prices at over $3 per gallon, saving money on the daily need of transportation is something everyone would be interested in.

How do we Obtain the Data

We can obtain all the baseline prices and advertised mpg information from the company websites and/or dealerships. The only way to find the actual MPG would be to take a brand new model of each car to an indoor race track, fill it with an exact known amount of gas (3 gallons should be enough to get an accurate reading) and drive it until it stops. Each car would have to be driven in the same fashion as far as acceleration to testing speed, final cruising speed, and rpm management. For accuracy you could repeat this test one or two times for each car. This would give you a general “highway” MPG. If you wanted a city MPG you could do the same thing, but have repeated scheduled stops to represent traffic and traffic lights.

What Specific Questions Will the Data Answer

a)  Which makes and models get the most miles per gallon (both city and highway)?

b) Which companies are being honest about their advertised MPG rating?

c)  Is the cost of buying expensive Hybrid technology worth the money you save in gas?

travelocityimage.gif

Description of data:

· Travelocity’s URL: www.travelocity.co

· It is an online travel agency that operates mainly for US customer, but also offers services in Canada, Germany, France, UK and Scandinavian countries. (Any customer from around the world is offered services, but operators and headquarters are in the US. Therefore, if experiencing a technical problem, the customer needs to call a US phone number or send mail to a US address).

· It allows customers to book their travels, by offering services for vacation packages, flights, hotels, and ground-transportation. Cruise, last minute packages and activities. Sub services are offered for each of these services.

Why is this data interesting?

· Traveling is a part of everyone’s life, every once in a while. Travelocity allows traveling services for anyone from anywhere, because when searching for the services offered, the customer can enter the amount of money that they are willing to pay/can afford, before getting back any results for the searches. Some people also do not have access to traveling agencies. This solves that issue for customers who want to book a last minute travel, or for customers who do not have a readily available access to a travel agency. For these instances, Travelocity is the perfect choice.

· It caters the traveling needs for all customers, from all parts of the world, who wish to travel anywhere across the world.

· It shows what are the hottest traveling destinations

· It offers travel services to all types of people, e.g. Newlyweds, gay and lesbian, etc.

· It shows all the up to date hottest (most affordable) traveling deals.

· For each result, pricing, images, and rating are offered, elaboration on the result is offered, comparisons with other related results are also provided. (In the case of hotel, location is also provided).This allows the customers to make their perfect choice.

· For the clueless customers who just want a getaway, trip ideas services are offered.

· The stages of the process of booking a trip are shown in real time. This allows the customer to understand what he or she is doing.

How can we obtain the data?

· Data can be obtained from the Travelocity website (provided above).

· Data is arranged in a very straightforward, simple way. Services are ordered by the Dewey Decimal System: services and subservices. For each service shown, there is a link. When clicking on the link the beginning of the booking process for that specific service begins.

· For new customers who do not know how to get around the site and are “afraid” to explore it, on the front page, the mostly used services are bounded in a box (which takes about a quarter of the page. Therefore, it is hard to miss it). This box allows beginning the searches for these services in a simple way.

· On the other side of the page, hot packages, top rated destinations and special offers are provided.

· Here is an example when using the bounded box for services:

Ø I clicked on the hotel option, because I only want to make a booking for a hotel. Then picked country to be Jamaica (because that is where I want to go to). I picked the destination to be Negril. I picked my check in to be on November 21st and check out to be on November 25th. I picked the option of 1 room for two adults. Then, I clicked Search Now

Ø Results are shown on the same tab, in a new page. Results show the hotels available, their pricing, their rating (number of stars). More elaborations for each hotel is given as well. I decided to pick Riu Tropical Bay, from the list provided. I clicked on the Select button.

Ø On a new page a get the price ratings for what I picked before, per night. I am pleased, so I click the Select button again.

Ø I review the information provided on the next page. Everything seems OK. I continue to the next page where I put in my personal information (as well as my credit card number in order to pay).

Ø When booking is done, a confirmation will be sent to my email address. The confirmation will have to be printed and then I will have a reservation to Riu in Negril.

What specific questions the data will/can answer?

· Can I fly from Newark Airport to Honolulu, Hawaii, during December, for less than $500?

· Can Hedonism II in Jamaica accommodate a family of two parents and two kids, during the Thanksgiving break?

· What services does each hotel offer and where is it located?

· Can I catch the Continental Airlines flight from JFK to Tel Aviv tomorrow?

· What 7 day cruises go around the Caribbean are offered for two people, under $2000 for the month of February?

· Which countries acquire US Passport and/or Visa? (And many more…)

*Not all results in Travelocity are accurate, for e.g. when searching for hotels in Negril, Jamaica, results for hotels in Ocho Rios, Jamaica were also given.

Uncle Sam

  • Description of the data

This information source would compare the projected Presidential election winners nine months, six months, three months, and one month before the election with the actual candidate who wins the presidency. The website or data-bank would go back to the previous 50 years of presidential elections and conclude whether the predicted polls depicted who actually won. The user would be able to compare data from four different, old, and well-respected newspapers: The New York Times, The Washington Post, The New Hampshire Gazette, and the Chicago Tribune.

  • Why the data is interesting

This data is interesting for several different reasons. Firstly, it can show the accuracy of polling and predictions leading up to political elections. Secondly, the actual Presidential candidates and their campaigns can see the trend of polling accuracy over the ages and apply it to their own operations if necessary. Lastly, it is interesting for the general American public and anyone interested in United States government to see that every single vote makes a difference and how public opinion can change in a short period of time.

  • How we can obtain the data

Information about polling and predictions from recent elections are more easily accessible through the internet than the data dating back from before the creation of the internet. This information can still be found through dated newspapers, magazines, cataloged polling areas that keep that information. Although this may seem like a daunting task, it would be easy to obtain if the researcher had open access to a newspaper data-bank that had articles about this sort of information.

www.chicagotribune.com

www.washingtonpost.com

www.nytimes.com

www.nhgazette.com

  • What specific questions the data will/can answer

1) Which candidate has the best chance of winning for the upcoming election based on the results?

2) How accurate are voting polls based on their specific time frame?

3) Has the accuracy of voting polls increased or decreased over the past 50 years?

4) What historical/ political event caused a drastic change in American public opinion to have a specific candidate win the presidency?

…and many more. This data is versatile and can show a lot about polling, predictions, elections and public opinion in general.

grand_theft_auto_vice_city-front.jpg

Description of the data:

I would like to look at information dealing with which age range plays video games more during the course of a year and what types of games are played more among gamers and the influence video games have on gamers using the ratings that is listed on video games. finally, i want to see if there is any connection between a person’s action and the games that were played while using news articles and cases to see if there is any link between the two items that i am looking for.

Why is this data interesting:

This information can help the video game industry figure out what type of games to create based upon the age range such as the violent and blood content in a game. Also, the data could help with figuring out how many violent games do people actually play which in turn could help psychologists and other doctors figure out if there is any connection to people acting more violent toward others and other violent acts occurring in the world such as shootings on college campuses.

How can we obtain this data:

The data can be found at http://news.digitaltrends.com/news/story/14441/survey_adults_favor_casual_games_for_kids and it states on the site:
“The survey also found that the older kids get, they more likely they were to play video games three times a week or more; however, the figures topped out at almost one third (32 percent) of teens aged 14 to 17 saying they play three times a week or more”

The information shows that it is mostly teenagers who play video games and on this site it explains the impact of violent video games:
http://media.www.fchornet.com/media/storage/paper921/news/2007/04/18/Opinion/Video.Games.MaturePlayers.Dont-2849962.shtml

on this site, you can find information on the top games that players like to play which from the site, all of them are violent and bloody:
http://www.gamespot.com/games.html?type=top_rated&page_type=games&tag=subnav;top_games&om_act=convert&om_clk=subnav

http://www.personalityresearch.org/papers/kooijmans.html- a site deaing with the history of video games and how they affect a person’s aggression

http://www.apa.org/science/psa/sb-anderson.html- myths and facts dealing with violent video games

http://www.foxnews.com/story/0,2933,177090,00.html- News article about violent video games
What questions will this data answer?

1. What age range do game players play the most video games?
2. what type of video games do players like to play?
3. Does the violence in video games promote people to perform evil deeds?