Recall that SGNS and GloVe create two representations for each word – one for when it is the target word and one for when it is a context word (i.e., in the context window of some other word). Either representation set can be used, although one generally keeps the word vectors W and discards the context vectors C. Any interaction between word and context vectors can be modelled as an inner product <⋅,⋅> : W x C -> F, where F is a field of scalars. Note that the dot product is a type of inner product, but _not_ the only kind. Because the same word-context pairs are used to update both W and C during training, it should not matter whether the interaction between words x and y are represented as or (where _c denotes the context vector). In other words, the inner product should be invariant to which word is treated as the context vector. This is why past work (e.g., Arora et al. (2016)) assumes that the word and context vectors are identical. If we use the full-dimensional vectors, this property is trivially satisfied. Because the same training pairs are used for W and C and we assume the absence of reconstruction error, this property should also be satisfied by the low-dimensional embeddings. Returning to section 3.3 in the paper, if word vectors lied in different eigenspaces (i.e., if A in C = AW had non-distinct eigenvalues), then an inner product would not _necessarily_ be invariant to which word was treated as the context vector. Although the dot product would be invariant even if the eigenvalues were distinct, the same could not be said for _any_ inner product. This is why the eigenvalues must be non-distinct (under the assumptions provided in the paper). We overloaded the <⋅,⋅> notation in the last paragraph of section 3.3, but it refers to the inner product in the general case, not just the dot product. Note that this restriction on A is based on (1) what we know about the training data; (2) the training process; (3) what we assumed about the reconstruction error. This restriction isn't derived from any property of the factorized word-context matrix itself. Special thanks to Steven Cao and Zhuang Boyuan for requesting clarification on this point; their requests prompted this addendum.