Archive for the Data project Category

I really liked the video dealing with software piracy especially since now one of the biggest issues is music piracy which has affected tons of singers. In the video, software producers explain why software is so expensive due to marketing and creating the software. It takes up to 1500 to 200 hours to create software. Various reporters and lawyers were saying that this is no different than car theft. A software developer explained a program called locksmith which can create identical copies of software and include the serial number and all the protection that the original software contains. Vice president of Activision stated that all buyers can simply call a hot line and state their issue and the manufacturers would help them. School system uses Locksmith program the most because they create an archive and give the backup to their students for usage. Franckie mouse who is a software pirate was interviewed and explained that pirates do this for the recognition and to be able to outsmart the software workers however, pirates do not receive any profits from their illegal work. He also showed how he illegally copies programs and the software can be used by anyone in the nation. A lawyer stated that the penalty of piracy can be up to five years in jail and fines. At the end, an editor for a software company said that the best way to reduce piracy is to create cheaper software’s which are more affordable to consumers. The video was very interesting and I encourage more people to look at it because of the new form of theft in this day and age.

logo_flights.gif

Flights.com

Description of the data:

Website used: http://www.flights.com

Flights.com is a website that allows users to book flights, hotels, cars, cruises, hostels, European rails, and insurance (for trips) across the world. Even though it serves many purposes for customers, I would like to use this website only to acquire statistics about flights data. I would like to compare the flights’ data released on a time period of one month (from November 12, to December 12th). Different departure and destination airports will be chosen randomly, to acquire the data. Data such as: how many airlines offer services from the departing airport, and at the arrival airport, will be acquired. If some airlines offer more than one flight per day, this would be recorded as well). Then, prices will be compared for each airline’s service.

Why is this data interesting?

Travelling has become a part of everyone’s lives within the past few years, due to globalization and industrialization. Many times, people book flights with airlines that do not offer the best service. Yet, they are ending up paying way more than they are supposed to. It is hope that after analyzing this data, prevention for customers to pay extra money will start taking place.

How can we obtain the information?

Information can be obtained through the given website (http://www.flights.com). On the webpage, there is a tab that says “Flights.”This will scroll down through the page, in order for the user to reach the search box for the flights. This search box allows the user to modify the search for flights. In order to modify the search, the user will enter the data asked in the appropriate field.

What questions can this data answer?

· Which airline offers better service and is still cheaper than others?

· What makes one airline better than another?

After acquiring the information online, this is how the data will be analyzed:

Date departure Arrival airline price Travelling time # of flights/day

Description of the Data
My proposal is to compile and compare statistics released by independent publications and services to more traditional and controlled publications and mediums.
I would compare Last.fm’s global charts to those released by Billboard magazine over the course of a one month period to track artists’ weekly rankings in both publications. Because the music industry has changed so dramatically over the past few years, I would also be interested in comparing the iTunes Store statistics of artists sales during the said time period.

How we can obtain the data:

All of this data is readily accessible on existing data statistics. A sample Excel file for doing this is here: music.xls.


http://www.last.fm/music/+charts/track/

http://www.billboard.com/bbcom/charts/chart_display.jsp?g=Singles&f=The+Billboard+Hot+100
http://www.apple.com/euro/itunes/charts/top10songs.html
http://www.pitchforkmedia.com/page/best_new_music

Why this data is interesting:
Tracking five artists, to be selected once the data has been compiled and compared, would provide insight into the listening habits of a wide scope of individuals. Last.fm users, iTunes users, and Billboard charts would provide a spectrum of insight into music trends and the way the nation absorbs the enormous amount of information/music available to the public through radio and internet.

I would then take the rankings and statistics of said 5 artists, determine their sales stats on iTunes.

What specific questions can the data answer:

Who is listening to what?
Where are they listening?
What does the data reveal about listener trends in technology savy users compared to traditional radio spin statistic? What about iTunes downloads? What do independent music reviews have to say about particular artists and does this have any effect on the success of particular artists?

It would be interesting to then investigate the Last.fm user demographics for top listeners of the 5 particular artists and determine who is listening to what, and where in the country. To focus the scope of the project I will focus on US based membership only. The main user info that will be collected for the project is gender, age, and location of users. Once the project evolves into a more focused set of data, it would be interesting to use all of the different statistic information available through Last.fm in regards to top listener rankings for the said artists.

A little background information about Last.fm is included below… [all info taken from wikipedia]

Last.fm is a UK-based internet radio and music community website, founded in 2002. It is the world’s largest social music platform with over 20 million active users based in more than 232 countries. On 30 May 2007, CBS Interactive acquired Last.fm for £140m (US$280m), making Last.fm the largest European Web 2.0 purchase to date.
Using a unique music recommendation system known as “Audioscrobbler”, Last.fm builds a detailed profile of each user’s musical taste by recording details of all the songs the user listens to, either on the streamed radio stations or on the user’s computer or portable music device. This information is transferred to Last.fm’s database (”Scrobbled”) via a plugin installed into the user’s music player. The profile data is displayed on a personal web page. The site offers numerous social networking features and can recommend and play artists similar to the user’s favourites.
Users can create custom radio stations and playlists from any of the audio tracks in Last.fm’s music library, but are not able to listen to individual tracks on demand, or to download tracks unless the rightsholder has previously authorised it. Registration is required to acquire a profile but is not necessary to view any part of the site or to listen to radio stations.

Global charts
Last.fm generates weekly “global” charts of the top 400 artists and tracks listened to by all Last.fm users. To prevent the artificial boosting of an artist or song by deliberately repeated tracks from a single listener, these charts are based on the total number of individual listeners (the reach) and not the number of actual plays.

The result is notably different from traditional commercial music charts provided by the UK Top 40, Billboard magazine, Soundscan and others, which are based on radio plays or sales. Last.fm charts are less volatile and a new album’s release may be reflected in play data for many months or years after it drops out of commercial charts. For example, The Beatles have consistently been a top 5 band at Last.fm, reflecting the continued popularity of the band’s music irrespective of current album sales. In addition the Last.fm charts are much more rock, indie and alternative influenced and less pop-influenced than regular charts.
The main reason behind the differences is that the charts reflect the musical taste of the particular demographic of the service’s users, not that of the general public. Last.fm users generally have an Internet connection, may be more computer-literate than average, and may have wide collections of music from which to choose, due to the ability to download MP3 files from the internet.
The Global Tag Chart shows the 100 most popular tags that have been used to describe artists, albums, and tracks. This is based on the total number of times the tag has been applied by last.fm users since the tagging system was first introduced and does not necessarily reflect the number of users currently listening to any of the related “global tag radio” stations.
For the week ending October 14th 2007, Radiohead broke the last.fm record for both weekly plays and weekly listeners following the release of In Rainbows. Track 15 Step set records for weekly plays and listeners and the ten tracks from In Rainbows made up the weekly top 10, with the lowest charting In Rainbows song having almost three times the number of listeners of the next highest placed track (Stronger by Kanye West, which had itself set a record for number of listeners a few weeks previously). The Radiohead album held the top 10 spots for the four weeks after its release.

a. Describe the data (clear, precise)
I plan on using the webpage aolstalker, which comprises a search engine to look through the AOL data released in 2006. The site show what anonymous AOL users searched for in a given time frame. If I were to conduct this on a larger scale I could use larger search engines such as Google or Yahoo. I will be trying to prove that a large portion of those who search for “Chinese Food” are college students.
b. Why is this data interesting?

The data concerning what types of people search for Chinese food could be used in order to place ads on the campuses webpage such as the ads seen on the schools news paper, The Acorn. Currently there are several Chinese food services that advertize with pamphlets on campus but a detailed report on how many college students search for Chinese food would be extremely helpful for more wide scale marketing campaign. The information gathered could also help college campuses dining contracts more successful by supplying more food that students love. As well as dining contracts the data could also be used to help determine eating habits of college students and examine why the particular food is appealing.

c. How do we obtain this data?
I will obtain the data by going onto aolstalker.com or if I were to conduct this search on a larger scale I could contact Google or Yahoo for search results for Chinese food. While at AOL stalker I can search though data free of charge; however, I would have to pay to get searches from Google or Yahoo. Once I gained the data base of items searched for, I could place in key words such as “Chinese food” and see who searched for that data in the search engine as well as the other items they searched for. The data will be easy to obtain and looking through the search results I can tell what college related sites they went to (eg. Facebook, universities, ect).
d. What questions will the data answer?
I want to be able to prove that college students are a large population of the Chinese food business. I will be able to get this by analyzing search result data from Chinese food and see what other searches they looked for that would relate to colleges and/or Universities.

- Matthew Fingerman (’11)

travelocityimage.gif

Description of data:

· Travelocity’s URL: www.travelocity.co

· It is an online travel agency that operates mainly for US customer, but also offers services in Canada, Germany, France, UK and Scandinavian countries. (Any customer from around the world is offered services, but operators and headquarters are in the US. Therefore, if experiencing a technical problem, the customer needs to call a US phone number or send mail to a US address).

· It allows customers to book their travels, by offering services for vacation packages, flights, hotels, and ground-transportation. Cruise, last minute packages and activities. Sub services are offered for each of these services.

Why is this data interesting?

· Traveling is a part of everyone’s life, every once in a while. Travelocity allows traveling services for anyone from anywhere, because when searching for the services offered, the customer can enter the amount of money that they are willing to pay/can afford, before getting back any results for the searches. Some people also do not have access to traveling agencies. This solves that issue for customers who want to book a last minute travel, or for customers who do not have a readily available access to a travel agency. For these instances, Travelocity is the perfect choice.

· It caters the traveling needs for all customers, from all parts of the world, who wish to travel anywhere across the world.

· It shows what are the hottest traveling destinations

· It offers travel services to all types of people, e.g. Newlyweds, gay and lesbian, etc.

· It shows all the up to date hottest (most affordable) traveling deals.

· For each result, pricing, images, and rating are offered, elaboration on the result is offered, comparisons with other related results are also provided. (In the case of hotel, location is also provided).This allows the customers to make their perfect choice.

· For the clueless customers who just want a getaway, trip ideas services are offered.

· The stages of the process of booking a trip are shown in real time. This allows the customer to understand what he or she is doing.

How can we obtain the data?

· Data can be obtained from the Travelocity website (provided above).

· Data is arranged in a very straightforward, simple way. Services are ordered by the Dewey Decimal System: services and subservices. For each service shown, there is a link. When clicking on the link the beginning of the booking process for that specific service begins.

· For new customers who do not know how to get around the site and are “afraid” to explore it, on the front page, the mostly used services are bounded in a box (which takes about a quarter of the page. Therefore, it is hard to miss it). This box allows beginning the searches for these services in a simple way.

· On the other side of the page, hot packages, top rated destinations and special offers are provided.

· Here is an example when using the bounded box for services:

Ø I clicked on the hotel option, because I only want to make a booking for a hotel. Then picked country to be Jamaica (because that is where I want to go to). I picked the destination to be Negril. I picked my check in to be on November 21st and check out to be on November 25th. I picked the option of 1 room for two adults. Then, I clicked Search Now

Ø Results are shown on the same tab, in a new page. Results show the hotels available, their pricing, their rating (number of stars). More elaborations for each hotel is given as well. I decided to pick Riu Tropical Bay, from the list provided. I clicked on the Select button.

Ø On a new page a get the price ratings for what I picked before, per night. I am pleased, so I click the Select button again.

Ø I review the information provided on the next page. Everything seems OK. I continue to the next page where I put in my personal information (as well as my credit card number in order to pay).

Ø When booking is done, a confirmation will be sent to my email address. The confirmation will have to be printed and then I will have a reservation to Riu in Negril.

What specific questions the data will/can answer?

· Can I fly from Newark Airport to Honolulu, Hawaii, during December, for less than $500?

· Can Hedonism II in Jamaica accommodate a family of two parents and two kids, during the Thanksgiving break?

· What services does each hotel offer and where is it located?

· Can I catch the Continental Airlines flight from JFK to Tel Aviv tomorrow?

· What 7 day cruises go around the Caribbean are offered for two people, under $2000 for the month of February?

· Which countries acquire US Passport and/or Visa? (And many more…)

*Not all results in Travelocity are accurate, for e.g. when searching for hotels in Negril, Jamaica, results for hotels in Ocho Rios, Jamaica were also given.

Uncle Sam

  • Description of the data

This information source would compare the projected Presidential election winners nine months, six months, three months, and one month before the election with the actual candidate who wins the presidency. The website or data-bank would go back to the previous 50 years of presidential elections and conclude whether the predicted polls depicted who actually won. The user would be able to compare data from four different, old, and well-respected newspapers: The New York Times, The Washington Post, The New Hampshire Gazette, and the Chicago Tribune.

  • Why the data is interesting

This data is interesting for several different reasons. Firstly, it can show the accuracy of polling and predictions leading up to political elections. Secondly, the actual Presidential candidates and their campaigns can see the trend of polling accuracy over the ages and apply it to their own operations if necessary. Lastly, it is interesting for the general American public and anyone interested in United States government to see that every single vote makes a difference and how public opinion can change in a short period of time.

  • How we can obtain the data

Information about polling and predictions from recent elections are more easily accessible through the internet than the data dating back from before the creation of the internet. This information can still be found through dated newspapers, magazines, cataloged polling areas that keep that information. Although this may seem like a daunting task, it would be easy to obtain if the researcher had open access to a newspaper data-bank that had articles about this sort of information.

www.chicagotribune.com

www.washingtonpost.com

www.nytimes.com

www.nhgazette.com

  • What specific questions the data will/can answer

1) Which candidate has the best chance of winning for the upcoming election based on the results?

2) How accurate are voting polls based on their specific time frame?

3) Has the accuracy of voting polls increased or decreased over the past 50 years?

4) What historical/ political event caused a drastic change in American public opinion to have a specific candidate win the presidency?

…and many more. This data is versatile and can show a lot about polling, predictions, elections and public opinion in general.

grand_theft_auto_vice_city-front.jpg

Description of the data:

I would like to look at information dealing with which age range plays video games more during the course of a year and what types of games are played more among gamers and the influence video games have on gamers using the ratings that is listed on video games. finally, i want to see if there is any connection between a person’s action and the games that were played while using news articles and cases to see if there is any link between the two items that i am looking for.

Why is this data interesting:

This information can help the video game industry figure out what type of games to create based upon the age range such as the violent and blood content in a game. Also, the data could help with figuring out how many violent games do people actually play which in turn could help psychologists and other doctors figure out if there is any connection to people acting more violent toward others and other violent acts occurring in the world such as shootings on college campuses.

How can we obtain this data:

The data can be found at http://news.digitaltrends.com/news/story/14441/survey_adults_favor_casual_games_for_kids and it states on the site:
“The survey also found that the older kids get, they more likely they were to play video games three times a week or more; however, the figures topped out at almost one third (32 percent) of teens aged 14 to 17 saying they play three times a week or more”

The information shows that it is mostly teenagers who play video games and on this site it explains the impact of violent video games:
http://media.www.fchornet.com/media/storage/paper921/news/2007/04/18/Opinion/Video.Games.MaturePlayers.Dont-2849962.shtml

on this site, you can find information on the top games that players like to play which from the site, all of them are violent and bloody:
http://www.gamespot.com/games.html?type=top_rated&page_type=games&tag=subnav;top_games&om_act=convert&om_clk=subnav

http://www.personalityresearch.org/papers/kooijmans.html- a site deaing with the history of video games and how they affect a person’s aggression

http://www.apa.org/science/psa/sb-anderson.html- myths and facts dealing with violent video games

http://www.foxnews.com/story/0,2933,177090,00.html- News article about violent video games
What questions will this data answer?

1. What age range do game players play the most video games?
2. what type of video games do players like to play?
3. Does the violence in video games promote people to perform evil deeds?

fucknsoxdood.jpg1) Description of the Data
a. My Data Project would use different aspects from Baseball-Reference.com extensive history of baseball statistics. Baseball-Reference.com has complied data for all teams, players, managers, and box scores from the last 50 years in the sport of Baseball.

2) Why is this data interesting
a. I find this data extremely interesting because hands downs, baseball is a sport that  involves the most math. You can calculate the outcome of some sort of event based on what has happened previously. And since every single stat is a ratio between certain events that take place in game, i feel like if you can compile the data and enter it into some sort of equation, andcould predict to a certain percentage the probability of anything happening in a game, a streak, or a career.

3) How can we obtain this information
a.By browsing the site and collecting the needed data. Baseball-Reference.com for each player has Sub-Categories of stats for each player. From browsing the columns of data you can find the information that would aid in your question.

4) What questions will this data answer?
a. Are any players today on paths to set records?

b. What past player does any current player resemble statistically?

c. Comparison of teams that would never play each other due to the year they played

d.What to do you in any given scenario based on a player’s career?

Description of the Data:

A comprehensive analysis of what types of products are being shipped to locations all over the United States. The data could be grouped by any kind of location: State, county, or zip code to just name a few.

Why this data is interesting: 

To see what types of products are going where in the country would be a great guide for businesses. For instance, if there was a trend of home theatre electronics being delivered to a certain zip code in Ohio, it might be divined that someone could make a killing by opening up a home theatre store. A benefit to both the consumers, who now have a nearby location to purchase such things, and the business, that is now reaping the rewards.

 How we can obtain the data: 

Realistically, we can’t. However if some of the larger sites were willing to sell information based around the type of product sold and the zip code that it was shipped to, it would be possible to analyze the data in many meaningful ways.

Some might find it a breach of their privacy to have large corporations selling their private data of what they were buying. I’m certain these people will soon be quieted once they are witness to the fantastic savings and convenience that will come of it.

I wouldn’t think you’d have to worry about many companies signing on. If they would agree sites like buy.com and amazon.com would cover a good portion of the online shopping world. Meanwhile, digital version of real stores (target.com, bestbuy.com, etc…) would probably be happy to lend the data if it could be compiled with other sites that would then help them plan where to put their next store.

What specific questions can the data answer:

  • What types of things are popular where?
  • What part of the country buys the most of what?
  • Where the best places would be for new specialty stores?

Description of the Data

Founded in February 2005, YouTube is the leader in online video, and the premier destination to watch and share original videos worldwide through a Web experience. YouTube allows people to easily upload and share video clips on www.YouTube.com and across the Internet through websites, mobile devices, blogs, and email.

Why the data is interesting?
Everyone can watch videos on YouTube. People can see first-hand accounts of current events, find videos about their hobbies and interests, and discover the quirky and unusual. As more people capture special moments on video, YouTube is empowering them to become the broadcasters of tomorrow.

YouTube has struck numerous partnership deals with content providers such as CBS, BBC, Universal Music Group, Sony Music Group, Warner Music Group, NBA, The Sundance Channel and many more.

How we can obtain the data?

The data can be obtained by typing in specifically what you want to find in this website in the search box located on the top of the page or click on the tabs, also on the top of the page. You can search from the “most played video” to the “news & politics” category to the channels from popular music artists or even from everyday people who want their shot at stardom through the world wide web. You can also look at the most recent and most popular videos on the site.

We can also take a look at a user’s popularity on youtube with the use of vidstats.com and also by looking http://www.micropersuasion.com/2006/08/youtube_by_the_.html and http://www.churchcommunicationspro.com/2007/03/20/some-enlightening-internet-video-statistics-youtube/ for statistics of the site.
What specific questions the data will/can answer?

  • How many registered users are there?
  • How many people are on this site right now?
  • In what part of the world makes up the most YouTube users?
  • Can I watch some of my favorite TV shows? Music videos from my favorite artists? Movies?
  • What’s the most subscribed channel?
  • What’s the most watched video of all time?
  • What are people watching right now?
  • Is there a track history of the videos I’ve watched from YouTube?
  • Can I make my own personal channel and upload videos of myself and my everyday so-called teen angst life? If so, are there going to be a lot of people watching my senseless acts? And can I get famous for doing so?

-Georgia Cruz