The intention of the Data Science Ethics Best Practices is to be a framework in which we can function rather than a strict set of rules. There is no universally-accepted or agreed-to code of ethics for this field, despite many calls for them and attempts at creating them.

This Data Science Ethics Best Practices is a set of guidelines to keep in mind while doing or interacting with data science. It’s also a handy acronym – PRACTICE.

None of us is perfect in applying unbiased, ethical methods, but we can all practice at it.

Data Science Ethics in Practice

Protect Privacy

Always abide by the privacy regulations of the areas in which you operate and the areas from which you have collected data. That said, regulation should not be the only reason to maintain discretion when it comes to the data with which you work. Privacy has more to do with respecting the subjects of the data than it does the legal ramifications of disclosure. Take all necessary precautions to safeguard the information with which you are entrusted.

Retain Responsibility

Following through on the responsibilities around ethics requires practitioners and their partners to acknowledge and rectify issues that arise. Somewhere, somehow, something will go awry. When that happens, clearly articulate what occurred, why, and then work earnestly to resolve it.

Anticipate Adversaries

Not all people who access, create, or utilize data and algorithms do so with good intentions. Some people are nefarious actors. Some are seeking opportunities to expose sensitive information or to use it against others. Try to minimize potential harm that could be done with what you collect and create by thinking like a foe. How might someone try to abuse this system if they got access?

Collect Carefully

When collecting data, consider how it is being sourced, from whom or what, and what has been communicated regarding its use. Only collect what you will use and be sure to collect all that you will need. Document known biases in the data where groups are over- or under-represented or where treatment of those groups has been different.

Train Transparently

Be open and transparent as to how you are using the data you collect (within the bounds of still protecting your IP, of course). While bias is likely to happen, being transparent about the assumptions and data modifications made will allow others to help spot areas of potential bias so that they can be corrected.

Incorporate Inclusivity

Gathering many and varied perspectives is crucial to finding potential biases, gaps, or vulnerabilities. This goes for both the inclusion of people and data from a wide population. Work together with subject matter experts from other fields, junior counterparts, or just people from other walks of life to expand your frame of reference.

Consider Context

Some data science work requires more sensitivity due to the context in which it will be used. For instance, algorithms or experiments that impact health, social status, or finances should be considered more carefully. In statistics, we might think of this in determining the statistical power of a test. In data science, the power of the process must be adjusted based on the potential downstream implications.

Encode Equity

All of the efforts made across the practice leads to this – getting to more equitable outcomes. Ensuring that algorithms are used in a fair manner, as free of bias as possible, takes a principled approach to all stages of the data science lifecycle.

Contribute to the Data Science Ethics in Practice

This Data Science Code of Ethics is in a perpetual state of draft. Comment below with your thoughts, questions, and suggestions to help improve the Code.

3 2 votes

Article Rating