CSC 7700
Data
Intensive Distributed Computing
Fall 2006
Instructor:
Prof. Tevfik Kosar
Office: 292 Coates Hall
Phone: 578-9483
Email: kosar@lsu.edu
Office hours: Wed & Thu, 1:30pm-2:30pm
The computational and data requirements of applications from different fields of science, including coastal and environmental modeling, geospatial analysis, bioinformatics, medical imaging, fluid dynamics, petroleum engineering, numerical relativity, and high energy physics, have been increasing exponentially over the recent years.
This increase in the demand for the computational and data resources has necessitated collaboration and sharing among the nation’s education and research institutions and use of distributed resources owned by collaborating parties. While traditional distributed systems work well for computation that requires limited data handling, they fail in unexpected ways when the computation accesses, creates, and moves large amounts of data over wide-area networks.
This course aims to focus on the challenges of distributed systems imposed by the data intensive applications, and on the different state-of-the-art solutions proposed to overcome these challenges. The topics to be covered during this course include but not limited to:
Introduction
to Distributed Computing
Data
Intensive Science & Applications
Computational
and Data Grids
Distributed
and Mass Storage Systems
Global/Parallel/Shared
File Systems
Remote
I/O and Data Staging
Distributed
Data Management and Scheduling
Complex
Workflow Management
Distributed
and Remote Visualization
Course Location and Time:
The course will be held Tuesdays and Thursdays between 3:10pm - 4:30pm at 225 Tureaud Hall.
Textbook:
There is no
required text for this course. We will have a reading list mostly consisting of
scientific papers published in this area (see below).
Reading List:
The reading list for this course is available here.
Grading:
This is a research course. You will be expected to read, understand and discuss scientific papers; as well as develop a research plan on a certain topic given to you. There will be no exams, no formal attendance taken. But, you are expected to attend the classes and actively contribute the discussions. Each student will select 2-3 papers from the reading list and present them in the class.
The major contribution to your final grade will be from the term project. You will be given a predefined set of projects from which you can choose one. You can also propose your own project as long as it is related to the scope of the course. At the end of the project, you will be expected to write a technical paper and also make an oral presentation on it.
The end-of-semester grades will be composed of:
Active Contribution: 10%
Paper Presentations: 30%
Term Project: 60%
Paper
Presentations:
Paper assignments for student presentations can be found here.
Projects:
The predefined example projects are available here. You can choose among these projects, or propose your own project. If your are planning to propose your own project, you need to come and discuss it with me before September 12th.
Project assignments are available here.
Class Mailing
List:
There is a mailing list (csc7700@cct.lsu.edu) for the important course announcements including projects, presentations and reports. Please make sure that you provide an active email address to the instructor, and check your email frequently. You can access the archive of the mailing list here.
Course Schedule:
This schedule is tentative and subject to change. Please check the course web site regularly for the updates on the schedule; announcements on the projects, paper assignments, and other news about the course.
|
Date |
Lect. |
Topics Covered |
Student
Presenter |
Notes |
|
Aug 29 |
1 |
Background
- [1] [2] |
- |
|
|
Aug 31 |
2 |
Background
- [3] [4] |
- |
|
|
Sep 5 |
3 |
Applications
- [5] [6] |
- |
|
|
Sep 7 |
4 |
Applications
- [7] [8] |
- |
|
|
Sep 12 |
5 |
Grid
Toolkits - [9] [10] |
Farid, Ibrahim |
|
|
Sep 14 |
6 |
Grid
Toolkits - [11] [12] |
Archit, Maoyuan |
|
|
Sep 19 |
7 |
Distributed
Storage - [13] [14] |
Mehmet, Dayong |
|
|
Sep 21 |
8 |
Distributed
Storage - [15] [16] |
Ibrahim, Emrah |
|
|
Sep 26 |
9 |
Grid File
Systems - [17] [18] |
Archit, Alex |
|
|
Sep 28 |
10 |
Grid File
Systems - [19] [20] |
Maoyuan, Thair |
Project Research Plan Due |
|
Oct 3 |
11 |
Remote
I/O - [21] [22] |
Sirish, Emir |
|
|
Oct 5 |
|
|
|
Fall |
|
Oct 10 |
12 |
High Perf. Data Transfers - [23] [24] |
Esma, Thair |
|
|
Oct 12 |
13 |
High Perf. Data Transfers - [25] [26] |
Andrei, Wesley |
|
|
Oct 17 |
14 |
Data
Staging & Replication - [27] [28] |
Sidhanti, Partha |
|
|
Oct 19 |
15 |
Data
Staging & Replication - [29] [30] |
Mehmet, - |
|
|
Oct 24 |
16 |
Traditional
Scheduling - [31] [32] |
Emir, Esma |
Project Progress report - I Due |
|
Oct 26 |
17 |
Traditional
Scheduling - [33] [34] |
Chirag, Alex |
|
|
Oct 31 |
18 |
Data
Management - [35] [36] |
Sidhanti, Emrah |
|
|
Nov 2 |
19 |
Data
Management - [37] [38] |
-, Cornelius |
|
|
Nov 7 |
20 |
Visualization
- [39] [40] |
Farid, Wesley |
|
|
Nov 9 |
21 |
Visualization
- [41] [42] + WM [43] |
Cornelius, Andrei, Sirish |
|
|
Nov 14 |
|
|
|
No
class - SC06 Conference |
|
Nov 16 |
|
|
|
No
class - SC06 Conference |
|
Nov 21 |
22 |
Workflow
Management - [44] [45] [46] |
Chirag, Dayong,
Partha |
Project Progress report - II Due |
|
Nov 23 |
|
|
|
Thanksgiving |
|
Nov 28 |
23 |
Project
Presentations |
|
|
|
Nov 30 |
24 |
Project
Presentations |
|
|
|
Dec 5 |
25 |
Project
Presentations |
|
|
|
Dec 7 |
26 |
Project
Presentations |
|
|
|
Dec 16 |
|
|
|
Final Project Report Due |