CSC 7700

Data Intensive Distributed Computing

Fall 2006

 

 

Instructor:

 

Prof. Tevfik Kosar

Office: 292 Coates Hall

Phone: 578-9483

Email: kosar@lsu.edu

Office hours: Wed & Thu, 1:30pm-2:30pm

 

Course Description:

 

The computational and data requirements of applications from different fields of science, including coastal and environmental modeling, geospatial analysis, bioinformatics, medical imaging, fluid dynamics, petroleum engineering, numerical relativity, and high energy physics, have been increasing exponentially over the recent years.

 

This increase in the demand for the computational and data resources has necessitated collaboration and sharing among the nation’s education and research institutions and use of distributed resources owned by collaborating parties. While traditional distributed systems work well for computation that requires limited data handling, they fail in unexpected ways when the computation accesses, creates, and moves large amounts of data over wide-area networks.

 

This course aims to focus on the challenges of distributed systems imposed by the data intensive applications, and on the different state-of-the-art solutions proposed to overcome these challenges. The topics to be covered during this course include but not limited to:

 

*    Introduction to Distributed Computing

*    Data Intensive Science & Applications

*    Computational and Data Grids

*    Distributed and Mass Storage Systems

*    Global/Parallel/Shared File Systems

*    Remote I/O and Data Staging

*    Distributed Data Management and Scheduling

*    Complex Workflow Management

*    Distributed and Remote Visualization

 

Course Location and Time:

 

The course will be held Tuesdays and Thursdays between 3:10pm - 4:30pm at 225 Tureaud Hall.

 

Textbook:

 

There is no required text for this course. We will have a reading list mostly consisting of scientific papers published in this area (see below).

 

Reading List:

 

The reading list for this course is available here.

 

Grading:

 

This is a research course. You will be expected to read, understand and discuss scientific papers; as well as develop a research plan on a certain topic given to you. There will be no exams, no formal attendance taken. But, you are expected to attend the classes and actively contribute the discussions. Each student will select 2-3 papers from the reading list and present them in the class.

 

The major contribution to your final grade will be from the term project. You will be given a predefined set of projects from which you can choose one. You can also propose your own project as long as it is related to the scope of the course. At the end of the project, you will be expected to write a technical paper and also make an oral presentation on it.

 

The end-of-semester grades will be composed of:

 

*    Active Contribution: 10%

*    Paper Presentations: 30%

*    Term Project: 60%

 

Paper Presentations:

 

Paper assignments for student presentations can be found here.

 

Projects:

 

The predefined example projects are available here. You can choose among these projects, or propose your own project. If your are planning to propose your own project, you need to come and discuss it with me before September 12th.

 

Project assignments are available here.

 

Class Mailing List:

 

There is a mailing list (csc7700@cct.lsu.edu) for the important course announcements including projects, presentations and reports. Please make sure that you provide an active email address to the instructor, and check your email frequently. You can access the archive of the mailing list here.

 

Course Schedule:

 

This schedule is tentative and subject to change. Please check the course web site regularly for the updates on the schedule; announcements on the projects, paper assignments, and other news about the course.

 

Date

Lect.

Topics Covered

Student Presenter

Notes

Aug 29

1

Background - [1] [2]

-

Aug 31

2

Background - [3] [4]

-

Sep 5

3

Applications - [5] [6]

-

Sep 7

4

Applications - [7] [8]

-

Sep 12

5

Grid Toolkits - [9] [10]

Farid, Ibrahim

Sep 14

6

Grid Toolkits - [11] [12]

Archit, Maoyuan

Sep 19

7

Distributed Storage - [13] [14]

Mehmet, Dayong

Sep 21

8

Distributed Storage - [15] [16]

Ibrahim, Emrah

Sep 26

9

Grid File Systems - [17] [18]

Archit, Alex

 

Sep 28

10

Grid File Systems - [19] [20]

Maoyuan, Thair

Project Research Plan Due

Oct 3

11

Remote I/O - [21] [22]

Sirish, Emir

 

Oct 5

 

 

 

Fall Holiday

Oct 10

12

High Perf. Data Transfers - [23] [24]

Esma, Thair

 

Oct 12

13

High Perf. Data Transfers - [25] [26]

Andrei, Wesley

 

Oct 17

14

Data Staging & Replication - [27] [28]

Sidhanti, Partha

 

Oct 19

15

Data Staging & Replication - [29] [30]

Mehmet, -

 

Oct 24

16

Traditional Scheduling - [31] [32]

Emir, Esma

Project Progress report - I Due

Oct 26

17

Traditional Scheduling - [33] [34]

Chirag, Alex

 

Oct 31

18

Data Management - [35] [36]

Sidhanti, Emrah

 

Nov 2

19

Data Management - [37] [38]

-, Cornelius

 

Nov 7

20

Visualization - [39] [40]

Farid, Wesley

 

Nov 9

21

Visualization - [41] [42] + WM [43]

Cornelius, Andrei, Sirish

 

Nov 14

 

 

 

No class - SC06 Conference 

Nov 16

 

 

 

No class - SC06 Conference 

Nov 21

22

Workflow Management - [44] [45] [46]

Chirag, Dayong, Partha

Project Progress report - II Due

Nov 23

 

 

 

Thanksgiving

Nov 28

23

Project Presentations

 

 

Nov 30

24

Project Presentations

 

 

Dec 5

25

Project Presentations

 

 

Dec 7

26

Project Presentations

 

 

Dec 16

 

 

 

Final Project Report Due