Pandas Cheat Sheet for Data Science in Python | DataCamp

Ask questions Research chat →

https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-for-data-science-in-python · scraped

data science python

Attachments

Scraped Content

— 499 words · 2026-02-14 03:21:10 UTC ·

Excerpt

Series A one-dimensional labeled array capable of holding any data type s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) DataFrame A two-dimensional labeled data structure with columns of potentially different types data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasilia'], 'Population': [11190846, 1303171035, 207847528]} df = pd.DataFrame(data,columns=['Country', 'Capital', 'Population'])   Country Capital Population 1 Belgium Brussels 11190846 2 India New Delhi 1303171035 3 Brazil Brasilia 207847528 Please note that the first column 1,2,3 is the index and Country,Capital,Population are the Columns. Asking For Help help(pd.Series.loc) I/O Read and Write to CSV pd.read_csv('file.csv', header=None, nrows=5) df.to_csv('myDataFrame.csv') Read multiple sheets from the same file xlsx = pd.ExcelFile('file.xls') df = pd.read_excel(xlsx, 'Sheet1') Read and Write to Excel pd.read_excel('file.xlsx') df.to_excel('dir/my
Series A one-dimensional labeled array capable of holding any data type s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) DataFrame A two-dimensional labeled data structure with columns of potentially different types data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasilia'], 'Population': [11190846, 1303171035, 207847528]} df = pd.DataFrame(data,columns=['Country', 'Capital', 'Population'])   Country Capital Population 1 Belgium Brussels 11190846 2 India New Delhi 1303171035 3 Brazil Brasilia 207847528 Please note that the first column 1,2,3 is the index and Country,Capital,Population are the Columns. Asking For Help help(pd.Series.loc) I/O Read and Write to CSV pd.read_csv('file.csv', header=None, nrows=5) df.to_csv('myDataFrame.csv') Read multiple sheets from the same file xlsx = pd.ExcelFile('file.xls') df = pd.read_excel(xlsx, 'Sheet1') Read and Write to Excel pd.read_excel('file.xlsx') df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') Read and Write to SQL Query or Database Table (read_sql()is a convenience wrapper around read_sql_table() and read_sql_query()) from sqlalchemy import create_engine engine = create_engine('sqlite:///:memory:') pd.read_sql(SELECT * FROM my_table;, engine) pd.read_sql_table('my_table', engine) pd.read_sql_query(SELECT * FROM my_table;', engine) df.to_sql('myDf', engine) Selection Getting Get one element s['b'] -5 Get subset of a DataFrame df[1:] Country Capital Population 1 India New Delhi 1303171035 2 Brazil Brasilia 207847528 Selecting', Boolean Indexing and Setting By Position Select single value by row and and column df.iloc([0], [0]) 'Belgium' df.iat([0], [0]) 'Belgium' By Label Select single value by row and column labels df.loc([0], ['Country']) 'Belgium' df.at([0], ['Country']) 'Belgium' By Label/Position Select single row of subset of rows df.ix[2] Country Brazil Capital Brasilia Population 207847528 Select a single column of subset of columns df.ix[:, 'Capital'] 0 Brussels 1 New Delhi 2 Brasilia Select rows and columns df.ix[1, 'Capital'] 'New Delhi' Boolean Indexing Series s where value is not >1 s[~(s > 1)] s where value is <-1 or >2 s[(s < -1) | (s > 2)] Use filter to adjust DataFrame df[df['Population']>1200000000] Setting Set index a of Series s to 6 s['a'] = 6 Dropping Drop values from rows (axis=0) s.drop(['a', 'c']) Drop values from columns(axis=1) df.drop('Country', axis=1) Sort and Rank Sort by labels along an axis df.sort_index() Sort by the values along an axis df.sort_values(by='Country') Assign ranks to entries df.rank() Retrieving Series/DataFrame Information Basic Information (rows, columns) df.shape Describe index df.index Describe DataFrame columns df.columns Info on DataFrame df.info() Number of non-NA values df.count() Summary Sum of values df.sum() Cumulative sum of values df.cumsum() Minimum/maximum values df.min()/df.max() Minimum/Maximum index value df.idxmin()/df.idxmax() Summary statistics df.describe() Mean of values df.mean() Median of values df.median() Applying Functions f = lambda x: x*2 Apply function df.apply(f) Apply function element-wise df.applynap(f) Internal Data Alignment NA values are introduced in the indices that don't overlap: s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd']) s + s3 a 10.0 b NaN c 5.0 d 7.0 Arithmetic Operations with Fill Methods You can also do the internal data alignment yourself with the help of the fill methods: s.add(s3, fill_value=0) a 10.0 b -5.0 c 5.0 d 7.0 s.sub(s3, fill_value=2) s.div(s3, fill_value=4) s.mul(s3, fill_value=3) Learn more about pandas Joining Data with pandas Reshaping Data with pandas Data Manipulation with pandas Writing Efficient Code with pandas

Visibility

Visible to everyone

Reading Status

Related Bookmarks

My Note


Saved!

Annotations

Export as Markdown
+ Annotate selection

Add Annotation