Jon Dehdari




Pandas Logo



Pandas FAQ

Pandas is a popular Python library for data science. It feels similar to R. Here are some useful links:

How do I concatenate two dataframes?


pd.concat([df1, df2])
	
Merge, join, and concatenate guide

How do I shuffle a dataframe?


df.sample(frac=1)
	
To also reset the index:

df.sample(frac=1).reset_index(drop=True)
	

How do I get the frequency counts of a column (or series)?


df['a'].value_counts()
	

How do I get only the numeric columns of a dataframe (ints, floats)?


df_nums = df.select_dtypes(include=[np.number]) 
	

How do I sort by a column?

This should have been .sort(), but is instead .sort_values() .

df.sort_values(by='column1')
df.sort_values(by=['column1', 'column2'])
df.sort_values(by=['column1', 'column2'], ascending=False)
	

How do I get the row(s) in the, say, 90th percentile?


df['col1'].quantile(.9)
	
See also:

df['col1'].rank(pct=True)
	
More to come!