Part two in the For the Love of Data series. Enigma covers part 2 of Pandas
The following topics are discussed
1) Another way to apply a condition to a field
2) Creating a DataFrame from a dictionary
3) Appending a data frame with another DataFrame
4) Joining DataFrames with merge and join
5) Writing an output to csv
Comment #1 posted on 2021-05-05 19:49:39 by b-yeezi
Another great show
Thanks for another great show. I look forward to your next one.
As to your use of `pd.apply` in lieu of `np.select`, here's my 2 cents:
Apply is more readable in most cases, but select is more performant. When performance matters, or when the dataset is very large, you might want to use `np.select`. For instance, when using `np.select` on your example here, the output was 10x faster on my PC.
```
%timeit df.apply(Scorelevel, axis=1)
448 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
```
%timeit np.select(cond_list, choice_list, default='Require Activation')
55.6 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
In many cases, the readability can trump the need for speed, but just wanted to give a counter-point.
Comment #2 posted on 2021-05-05 19:58:07 by b-yeezi
One more speed gain
If you really want to fly, you can turn the pandas series to numpy arrays first. For you example, it got twice as 2x faster than regular `np.select`.
Example:
```
cond_list = [df['Score'].values >= 9,
((df['Score'].values >= 8) & (df['Score'].values < 9)),
((df['Score'].values >= 7) & (df['Score'].values < 8)),
((df['Score'].values >= 6) & (df['Score'].values < 7)),
((df['Score'].values >= 5) & (df['Score'].values < 6)),
((df['Score'].values >= 4) & (df['Score'].values < 5))]
%timeit np.select(cond_list, choice_list, default='Require Activation')
23.5 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Leave Comment
Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.
Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).