CS286HMK assignmentnumber yourlastname last4digitofyourstudentnumberThe subject line must consist of the four identifiers listed. There is no space within an identifier and each identifier is separated by a space.
Percentage | Grade |
---|---|
92 and above | A |
90 - 91 | A- |
88 - 89 | B+ |
82 - 87 | B |
80 - 81 | B- |
78 - 79 | C+ |
72 - 77 | C |
70 - 71 | C- |
68 - 69 | D+ |
62 - 67 | D |
60 - 61 | D- |
59 and below | F |
Every day on Facebook, people interact with content in a different language than those they understand. We break those language barriers by supporting translation for more than 45 languages and over 2000 translation directions, serving more than 4B translations per day. We recently improved translation quality by shifting to deep learning techniques. We will go over the models served in production at Facebook and we will give an overview of current and future work.
Identity Resolution is the process of uncovering records that are co-referent to the same real-world individual. It plays an important role in wide variety of tasks including fraud detection, marketing, relationship discovery, and customer service.
Identity resolution has been topics of extensive research. We can broadly categorize the existing resolution approaches as deterministic and probabilistic. A deterministic approach produces the same resolution results and is generally dependent on a set of domain specific rules. A probabilistic approach relies on calculating various probabilities of key matches and combine to make a determination of matches. A probabilistic approach can employ certain machine learning algorithm to learn weights, thresholds, or other parameters to improve its accuracy and recall rate. Both these approaches have their applications. A probabilistic approach may be fine when achieving high accuracy is not critical. For example, if you want to target a marketing campaign to individuals based on their unified identities. In that case, the cost of lower accuracy of your resolution algorithm is minimal. Whereas, a deterministic approach may be more suitable if you want identify banking transactions of individuals across two different banks.
Identity resolution poses three primary challenges (1) the keys identifying the records do not exactly match either because intentional or unintentional errors or may not be present in all records, (2) Identity of a person a change over a period of time. For example, a person might change his or her name upon marriage, and (3) large data size makes pairwise comparison impossible to complete within a reasonable amount of time.
In this presentation, we plan to cover deterministic solution for identity resolution that addresses the three problems stated above. We have tested our solution on over 200 million unique identities with 1 trillion records. We implemented the solution using Spark/Hadoop on 80 nodes.