Original link:
http://www.quora.com/What-are-good-ways-to-handle-discrete-and-continuous-inputs-together/answer/Arun-Iyer-1
Now, you can represent all the features in a single vector which we can assume to be embedded in R^n and start using off-the-shelf packages for classification/regression etc.
http://www.quora.com/What-are-good-ways-to-handle-discrete-and-continuous-inputs-together/answer/Arun-Iyer-1
- Rescale bounded continuous features: All continuous input that are bounded, rescale them to [-1, 1] through x = (2x - max - min)/(max - min).
- Standardize all continuous features: All continuous input should be standardized and by this I mean, for every continuous feature, compute its mean (u) and standard deviation (s) and do x = (x - u)/s.
- Binarize categorical/discrete features: For all categorical features, represent them as multiple boolean features. For example, instead of having one feature called marriage_status, have 3 boolean features - married_status_single, married_status_married, married_status_divorced and appropriately set these features to 1 or -1. As you can see, for every categorical feature, you are adding k binary feature where k is the number of values that the categorical feature takes.
Now, you can represent all the features in a single vector which we can assume to be embedded in R^n and start using off-the-shelf packages for classification/regression etc.
Comments
Post a Comment