Extraction, Transform and Load

A friend of mine in the program commented that about 90% of the time doing data science is obtaining and cleaning data.

This is where programming is incredibly useful.  In the second year of my Masters program, my programming skills are not yet at the level that I want them to be.

I recently started some work for my research assistantship concerning Twitter data for @DataONEorg.

I’m interested in the content of posts, and the relationships between the actors in the network.

In terms of content, I’d like to look at the hashtags and links.

To illustrate how difficult it is to accomplish tasks “by hand,” I recently tried to the twitter data from a free site.  My efforts are documented here: <>.

I’ve read that employers should not hire a “data scientist” if the so-called “scientist” does not have programming skills.  For this reason, I’m disappointed that the School of Information Science does not offer a programming course within the School itself.  (I’ve heard Dr. Potnis will offer a course in Fall 2014, a semester after my graduation).

I enrolled in a programming course in the College of Engineering and Computer Science – Introduction to Programming for Scientists and Engineers.  The course focuses on C++ language.  This is unfortunate, as python is increasingly favored over C++.  This means more ready-made programs are available, and a user community is growing. Content management systems are even building up around python.

Python is used by a friend of mine who does genome science.  C++ is useful for taking advantage of parallelism, but that my friend who works on supercomputers uses python suggests to me that python works as well.

Programming language popularity.

Further reading:

Identifying Library & Information Science Graduate Student Competencies

Hi Dr. Mehra,

I read in the latest issue of Interface with interest:

His article co-authored with Dr. Vandana Singh entitled “Strengths and weaknesses of the information technology curriculum in library and information science graduate programs” has been published in the Journal of Librarianship and Information Science, 45(3).

The idea of matching course content with employer expectations has interested me since starting at SIS.

Personally I have taken a variety of courses outside SIS / CCI to develop the skillset I need for my own career goal as a professional data manager working with environmental data.

In reading your paper, I was particularly interested in your methods concerning the WebJunction competencies available online at <>.

For future research, I would like to suggest collecting job descriptions from job listings to synthesize employer expectations, as opposed to lists of competencies put together by any single organization. I’ve noticed job listings often have "required" and "desirable" qualifications.

My guess is this approach would yield an interdisciplinary, "state of the art" view expanding on WebJunction’s competencies focused on libraries. This approach might also impact your final recommendations, which I was disappointed to see did not include computer programming proficiency. Programming and data visualization skills frequently appear in job listings for my specific realm of interest, which is one reason I and two other SIS students in my cohort are taking "Introduction to Programming for Scientists and Engineers" in the College of Engineering in the Spring 2014 Semester, along with free online courses in programming. I do not believe that computer programming skills for data management tasks like quality assurance / quality control are skills that can presently be acquired through SIS or CCI coursework, which poses a problem for accumulating skills for the workplace while also accumulating credits towards the major.

Expanding on the e-portfolio concept, I recently began a tumblr blog to capture job listings that interest me. Along with tips on developing IT skills, I cut out the "qualifications" section for jobs I can imagine myself enjoying. This keeps me focused on developing the skills I need for work I would enjoy. The result is my "Data Pro" tumblr:

You might be able to do something similar (collect and analyze minimal / desired qualifications) for job listings from sources like,, or other appropriate job boards. I think this approach would be particularly valuable for building curriculum pertaining to information management for STEM fields.

Anyway I am glad you and Dr. Singh are looking at this issue in LIS education. I hope your research will help SIS continue to develop curriculum that keeps pace with what prospective employers need.



Tanner Jessel
Graduate Research Assistant:
Data Observation Network for Earth (DataONE)

Center for Information and Communication Studies
The University of Tennessee
Mail: 1345 Circle Park Drive, Suite 420