URL shorterner’s life

I was perusing some emails that came from a mailing list, old blog posts that I bookmarked, and old tweets that I favorited. Many of them contains somekind of link shorterners like tinyurl, bitly, and t.co.

While the URL shorterners are still functioning just fine, the actual URL themselves are not always so and sometimes I get a 404 error message from the target website. I know link rot happens, but somehow this irked me.

New goals for the year 2014

Adding new goals on the top of the usual goals and objectives for work:

  • Learn a new programming language: python
  • Learning R and processing.js
  • Learn data visualization tools
  • Learn a new language: I’m thinking either Japanese or Spanish
  • Create a 3D version of Sudoku when our library finally gets their 3D printer. Some possibilities: raised numbers, raised numbers with its braille counterpart, raised braille, or some graphics instead of numbers
  • Finish reading the Game of Thrones series (done with the 1st book, last month) – yeah, I’m not a fast reader when it comes to series.

I probably will add a few more on the list there later.


Ten root conditions of data quality problems

  1. Multiple data sources.  Multiple data sources of the same information produce different values for this information.  This can include values that were accurate at a given point in time.
  2. Subjective judgment in data production.  Information production using subjective judgment can result in the production of biased information.
  3. Limited computing resources.  Lack of sufficient computing resources limits accessibility to relevant information.
  4. Security/accessibility trade-off.  Easy access to information may conflict with requirements for security, privacy, and confidentiality.
  5. Coded data across disciplines.  Coded data from different functions and disciplines is difficult to decipher and understand.  Also, codes may conflict.
  6. Complex data representations.  Algorithms are not available for automated content analysis across instances of text and image information.  Non-numeric information can be difficult to index in a way that permits location of relevant information.
  7. Volume of data.  Large volumes of stored information make it difficult to access needed information in a reasonable time.
  8. Input rules too restrictive or bypassed.  Input rules that are too restrictive may impose unnecessary controls on data input and lose data that has important meaning.  Data entry clerks may skip entering data into field (missing information) or arbitrarily change a value to conform to rules and pass an edit check (erroneous information).
  9. Changing data needs.  As information consumers’ tasks and the organization environment (such as new market, new legal requirements, new trends) change, the information that is relevant and useful changes.
  10. Distributed heterogeneous systems.  Distributed heterogeneous systems without proper integration mechanisms lead to inconsistent definitions, formats, rules, and values.  The original meaning of data may be lost of distorted as data flows and is retrieved from a different system, time, place, data consumer, for same or different purposes.

Lee, Yang et.al. Journey to Data Quality. Cambridge: The MIT Press, 2006. 80-81